Search This Blog

Sunday, August 16, 2015

Building a multi-layer map in R using ggplot2 and spatial data

In this post, we will render spatial point and polygon data on the same map (as separate layers). The finished sample is the following:



This exercise involves using several libraries from R, as well as transforming data from one coordinate system to another. Here are the prerequisites

1. Install required spatial libraries into R: Please refer to this post, on how to do so.
2. Download spatial data for NYC pertaining to safety: Please refer to this post on how to do so.

So let's get started.

First step is to load all the libraries

# Load required libraries
library(rgdal)
library(ggplot2)
library(ggmap)
library(rgeos)
library(maptools)
library(scales)



Each of the above have a specific function in the process. Next step is to set the working directory for the data files. We use the setwd command for that and I have also set a variable for setting the shape file path.

# Set default paths for data
setwd("~/Work/Transportation/NYC/scripts")
shpfilepath="../data/shpfiles"

Next we read the shape files that contain annual fatality and injury data as well as data on borough boundaries for New York Let's start with the injury and fatality data. ogrInfo command provides high level summary of the data in each shape file.

# Read injury and fatality statistics
ogrInfo(shpfilepath,"fatality_yearly")
ogrInfo(shpfilepath,"injury_yearly")


Next, we read the shape file data into R and assign it to an R object variable.

ftlty_yr<-readOGR(shpfilepath, "fatality_yearly")
injry_yr<-readOGR(shpfilepath, "injury_yearly")
names(ftlty_yr); dim(ftlty_yr)
names(injry_yr); dim(injry_yr)

At this point the geometric data is not bound to the data frame, as we can see.


Also, lets create our first graph with ggplot. We read the data frame from the fatalities data. As we can see there is no geometry for us to plot at this point, so we will simply plot Fatalities against the year. GGPlot2 library distinguishes between data, aesthetics, from the graph. In the first example, we simply add the data to the base layer and ask ggplot to render it as a point.

# Extract the data frame and use ggplot
ftldata<-ftlty_yr@data
names(ftldata)
p<-ggplot(ftlty_yr@data, aes(Fatalities, YR))
p + geom_point()
p


This is the graph that was plotted...


This is our first attempt at understanding ggplot, but not what we wanted. The problem is that when we read the shapefile, the spatial data was not brought along.

We need to use the fortify command to add the spatial data. In the following steps, we can see how to do that. Once we fortify the data, we can extract the x and y coordinates as well.

ftlty.f <-fortify(as.data.frame(ftlty_yr))
names(ftlty.f);dim(ftlty.f)
fx<-ftlty.f$coords.x1
fy<-ftlty.f$coords.x2

Now, since we have the x and y, we can plot this data using ggplot

# Plotting the data using ggplot
Map <- ggplot(data=ftlty.f, aes(fx,fy,colour=factor(Fatalities)))
Map <- Map + layer(geom="point")
Map


Here is the plot output



Next step is to read the boroughs data. We use the familiar ogrInfo and readOGR functions to read the shapefile data into R
#Load boroughs data
ogrInfo(shpfilepath,"nybbwi")
bbnd <- readOGR(shpfilepath,"nybbwi")
The Boroughs data is in a different coordinate system than the accident data. While the accident data is in latitude longitude, the borough data is in Lambert conformal conic coordinate system. If we want to have the two map layers sit on top of one another, we need to have them in the same coordinate system. We use the spTransform function to transform the borough boundaries to latlong, as shown below...
#Transform the coordinates from Lambert Conformal to Lat Long
CRS.latlong<-CRS("+proj=longlat +datum=WGS84 +no_defs")
bbnd<-spTransform(bbnd, CRS.latlong)


In this case, again the spatial data and non-spatial data is not together. We need one dataset with both, so we need to fortify and merge the spatial and non-spatial datasets.

# Add the spatial data back to the boundary data.frame
bbnd.f <-fortify(bbnd, region="BoroCode")
names(bbnd.f);fix(bbnd.f);dim(bbnd.f)

# Combine the geometry and data back together using common keys
bbnd.f<-merge(bbnd.f,bbnd@data, by.x="id", by.y="BoroCode")
names(bbnd.f);dim(bbnd.f)

As we can see in the following window, using fortify alone brought in the 7 geometric attributes and merge brought the total to 10 attributes with attributes from the data frame. So, we need both fortify and merge commands to bring in the geometry and associated non-spatial attributes such as area and length together.


Next step is to build the map one layer at a time.

# Plot a map of the boroughs and accidents
baes <-aes(x=long, y=lat, group = group ,alpha=1/50,fill=id) # Aesthetics for Boroughs
bblyr <-geom_polygon(data=bbnd.f, baes) # Polygon layer for Bouroughs
faes <-aes(x=fx, y=fy, colour=factor(Fatalities)) # Aesthetics for Fatalities
flyr <-geom_point(data=ftlty.f,faes) # Point layer for Fatalities
Map <- ggplot() # Initialize the map
Map <-Map + bblyr # Add the boroughs layer
Map <-Map + flyr # Add the Fatalities layer
Map <- Map + labs(x = "Longitude", y = "Latitude", fill = "NYC Boroughs") # Add the labels
Map <- Map + ggtitle("NYC Boroughs with Accidents")
Map # Display the Map

Here is the console output.


Here is the map


That's it... For those who are interested, here is the complete source listing..

# Load required libraries
library(rgdal)
library(ggplot2)
library(ggmap)
library(rgeos)
library(maptools)
library(scales)

# Set default paths for data
setwd("~/Work/Transportation/NYC/scripts")
shpfilepath="../data/shpfiles"

# Read injury and fatality statistics
ogrInfo(shpfilepath,"fatality_yearly")
ogrInfo(shpfilepath,"injury_yearly")
ftlty_yr<-readOGR(shpfilepath, "fatality_yearly")
injry_yr<-readOGR(shpfilepath, "injury_yearly")
names(ftlty_yr); dim(ftlty_yr)
names(injry_yr); dim(injry_yr)

# Extract the data frame and use ggplot
ftldata<-ftlty_yr@data
names(ftldata)
p<-ggplot(ftlty_yr@data, aes(Fatalities, YR))
p + geom_point()
p

# Add the spatial data to the data frame
ftlty.f <-fortify(as.data.frame(ftlty_yr))
names(ftlty.f);dim(ftlty.f)
fx<-ftlty.f$coords.x1
fy<-ftlty.f$coords.x2

# Plotting the data using qplot
qplot(fx,fy,data=ftlty.f)
qplot(fx,fy,data=ftlty.f,colour=factor(Fatalities))
qplot(fx,fy, data=ftlty.f,colour=factor(Fatalities), facets= . ~ YR) + geom_point()

# Plotting the data using ggplot
Map <- ggplot(data=ftlty.f, aes(fx,fy,colour=factor(Fatalities)))
Map <- Map + layer(geom="point")
Map
 
#Load boroughs data
ogrInfo(shpfilepath,"nybbwi")
bbnd <- readOGR(shpfilepath,"nybbwi")

#Transform the coordinates from Lambert Conformal to Lat Long
CRS.latlong<-CRS("+proj=longlat +datum=WGS84 +no_defs")
bbnd<-spTransform(bbnd, CRS.latlong)

#Examine the data
bbndata <-bbnd@data
names(bbndata);fix(bbndata);dim(bbndata)

# Add the spatial data back to the boundary data.frame
bbnd.f <-fortify(bbnd, region="BoroCode")
names(bbnd.f);fix(bbnd.f);dim(bbnd.f)

# Combine the geometry and data back together using common keys
bbnd.f<-merge(bbnd.f,bbnd@data, by.x="id", by.y="BoroCode")
names(bbnd.f);dim(bbnd.f)

# Plot a map of the boroughs and accidents
baes<-aes(x=long, y=lat, group = group ,alpha=1/50,fill=id) # Aesthetics for Boroughs
bblyr<-geom_polygon(data=bbnd.f, baes) # Polygon layer for Bouroughs
faes<-aes(x=fx, y=fy, colour=factor(Fatalities)) # Aesthetics for Fatalities
flyr<-geom_point(data=ftlty.f,faes) # Point layer for Fatalities
Map <- ggplot() # Initialize the map
Map<-Map + bblyr # Add the boroughs layer
Map<-Map + flyr # Add the Fatalities layer
Map <- Map + labs(x = "Longitude", y = "Latitude", fill = "NYC Boroughs") # Add the labels
Map <- Map + ggtitle("NYC Boroughs with Accidents")
Map # Display the Map

No comments: