This exercise involves using several libraries from R, as well as transforming data from one coordinate system to another. Here are the prerequisites
1. Install required spatial libraries into R: Please refer to this post, on how to do so.
2. Download spatial data for NYC pertaining to safety: Please refer to this post on how to do so.
So let's get started.
First step is to load all the libraries
# Load required libraries library(rgdal) library(ggplot2) library(ggmap) library(rgeos) library(maptools) library(scales)
Each of the above have a specific function in the process. Next step is to set the working directory for the data files. We use the setwd command for that and I have also set a variable for setting the shape file path.
# Set default paths for data setwd("~/Work/Transportation/NYC/scripts") shpfilepath="../data/shpfiles"
Next we read the shape files that contain annual fatality and injury data as well as data on borough boundaries for New York Let's start with the injury and fatality data. ogrInfo command provides high level summary of the data in each shape file.
# Read injury and fatality statistics ogrInfo(shpfilepath,"fatality_yearly") ogrInfo(shpfilepath,"injury_yearly")
Next, we read the shape file data into R and assign it to an R object variable.
ftlty_yr<-readOGR(shpfilepath, "fatality_yearly") injry_yr<-readOGR(shpfilepath, "injury_yearly") names(ftlty_yr); dim(ftlty_yr) names(injry_yr); dim(injry_yr)
At this point the geometric data is not bound to the data frame, as we can see.
Also, lets create our first graph with ggplot. We read the data frame from the fatalities data. As we can see there is no geometry for us to plot at this point, so we will simply plot Fatalities against the year. GGPlot2 library distinguishes between data, aesthetics, from the graph. In the first example, we simply add the data to the base layer and ask ggplot to render it as a point.
# Extract the data frame and use ggplot ftldata<-ftlty_yr@data names(ftldata) p<-ggplot(ftlty_yr@data, aes(Fatalities, YR)) p + geom_point() p
This is the graph that was plotted...
This is our first attempt at understanding ggplot, but not what we wanted. The problem is that when we read the shapefile, the spatial data was not brought along.
We need to use the fortify command to add the spatial data. In the following steps, we can see how to do that. Once we fortify the data, we can extract the x and y coordinates as well.
ftlty.f <-fortify(as.data.frame(ftlty_yr)) names(ftlty.f);dim(ftlty.f) fx<-ftlty.f$coords.x1 fy<-ftlty.f$coords.x2
Now, since we have the x and y, we can plot this data using ggplot
# Plotting the data using ggplot Map <- ggplot(data=ftlty.f, aes(fx,fy,colour=factor(Fatalities))) Map <- Map + layer(geom="point") Map
Here is the plot output
Next step is to read the boroughs data. We use the familiar ogrInfo and readOGR functions to read the shapefile data into R
#Load boroughs data ogrInfo(shpfilepath,"nybbwi") bbnd <- readOGR(shpfilepath,"nybbwi")The Boroughs data is in a different coordinate system than the accident data. While the accident data is in latitude longitude, the borough data is in Lambert conformal conic coordinate system. If we want to have the two map layers sit on top of one another, we need to have them in the same coordinate system. We use the spTransform function to transform the borough boundaries to latlong, as shown below...
#Transform the coordinates from Lambert Conformal to Lat Long CRS.latlong<-CRS("+proj=longlat +datum=WGS84 +no_defs") bbnd<-spTransform(bbnd, CRS.latlong)
In this case, again the spatial data and non-spatial data is not together. We need one dataset with both, so we need to fortify and merge the spatial and non-spatial datasets.
# Add the spatial data back to the boundary data.frame bbnd.f <-fortify(bbnd, region="BoroCode") names(bbnd.f);fix(bbnd.f);dim(bbnd.f) # Combine the geometry and data back together using common keys bbnd.f<-merge(bbnd.f,bbnd@data, by.x="id", by.y="BoroCode") names(bbnd.f);dim(bbnd.f)
As we can see in the following window, using fortify alone brought in the 7 geometric attributes and merge brought the total to 10 attributes with attributes from the data frame. So, we need both fortify and merge commands to bring in the geometry and associated non-spatial attributes such as area and length together.
Next step is to build the map one layer at a time.
# Plot a map of the boroughs and accidents baes <-aes(x=long, y=lat, group = group ,alpha=1/50,fill=id) # Aesthetics for Boroughs bblyr <-geom_polygon(data=bbnd.f, baes) # Polygon layer for Bouroughs faes <-aes(x=fx, y=fy, colour=factor(Fatalities)) # Aesthetics for Fatalities flyr <-geom_point(data=ftlty.f,faes) # Point layer for Fatalities Map <- ggplot() # Initialize the map Map <-Map + bblyr # Add the boroughs layer Map <-Map + flyr # Add the Fatalities layer Map <- Map + labs(x = "Longitude", y = "Latitude", fill = "NYC Boroughs") # Add the labels Map <- Map + ggtitle("NYC Boroughs with Accidents") Map # Display the Map
Here is the console output.
Here is the map
That's it... For those who are interested, here is the complete source listing..
# Load required libraries library(rgdal) library(ggplot2) library(ggmap) library(rgeos) library(maptools) library(scales) # Set default paths for data setwd("~/Work/Transportation/NYC/scripts") shpfilepath="../data/shpfiles" # Read injury and fatality statistics ogrInfo(shpfilepath,"fatality_yearly") ogrInfo(shpfilepath,"injury_yearly") ftlty_yr<-readOGR(shpfilepath, "fatality_yearly") injry_yr<-readOGR(shpfilepath, "injury_yearly") names(ftlty_yr); dim(ftlty_yr) names(injry_yr); dim(injry_yr) # Extract the data frame and use ggplot ftldata<-ftlty_yr@data names(ftldata) p<-ggplot(ftlty_yr@data, aes(Fatalities, YR)) p + geom_point() p # Add the spatial data to the data frame ftlty.f <-fortify(as.data.frame(ftlty_yr)) names(ftlty.f);dim(ftlty.f) fx<-ftlty.f$coords.x1 fy<-ftlty.f$coords.x2 # Plotting the data using qplot qplot(fx,fy,data=ftlty.f) qplot(fx,fy,data=ftlty.f,colour=factor(Fatalities)) qplot(fx,fy, data=ftlty.f,colour=factor(Fatalities), facets= . ~ YR) + geom_point() # Plotting the data using ggplot Map <- ggplot(data=ftlty.f, aes(fx,fy,colour=factor(Fatalities))) Map <- Map + layer(geom="point") Map #Load boroughs data ogrInfo(shpfilepath,"nybbwi") bbnd <- readOGR(shpfilepath,"nybbwi") #Transform the coordinates from Lambert Conformal to Lat Long CRS.latlong<-CRS("+proj=longlat +datum=WGS84 +no_defs") bbnd<-spTransform(bbnd, CRS.latlong) #Examine the data bbndata <-bbnd@data names(bbndata);fix(bbndata);dim(bbndata) # Add the spatial data back to the boundary data.frame bbnd.f <-fortify(bbnd, region="BoroCode") names(bbnd.f);fix(bbnd.f);dim(bbnd.f) # Combine the geometry and data back together using common keys bbnd.f<-merge(bbnd.f,bbnd@data, by.x="id", by.y="BoroCode") names(bbnd.f);dim(bbnd.f) # Plot a map of the boroughs and accidents baes<-aes(x=long, y=lat, group = group ,alpha=1/50,fill=id) # Aesthetics for Boroughs bblyr<-geom_polygon(data=bbnd.f, baes) # Polygon layer for Bouroughs faes<-aes(x=fx, y=fy, colour=factor(Fatalities)) # Aesthetics for Fatalities flyr<-geom_point(data=ftlty.f,faes) # Point layer for Fatalities Map <- ggplot() # Initialize the map Map<-Map + bblyr # Add the boroughs layer Map<-Map + flyr # Add the Fatalities layer Map <- Map + labs(x = "Longitude", y = "Latitude", fill = "NYC Boroughs") # Add the labels Map <- Map + ggtitle("NYC Boroughs with Accidents") Map # Display the Map
No comments:
Post a Comment