Search This Blog

Tuesday, September 29, 2015

Re-projecting shapefile data during load in PostGIS

I recently came across the need to load data into PostGIS from Shape files. However, some data was in one projection and other data was in a different projection system. I needed to re-project the data at the time of loading the data.

The process is fairly straightforward. The syntax is

shp2pgsql -s :

Here is an example using real data..

Let's look at the data

gdalsrsinfo nybbwi.prj


As we can see, the projection is NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet, whose EPSG code is 2263. Since, I would like to load the data, I can use shp2pgsql command to create the load file.

shp2pgsql -d -s 2263 -g geom -e -I nybbwi nystaging.boroughs > $scriptdir/021_boroughs.sql

The next file was in a different projection. Here I am using a different command ogrInfo to get the projection information.

ogrinfo -al -so fatality_yearly.shp

This reveals that projection is latlong, or EPSG code 4326


As explained previously, we can use the -s option to re-project the data. The following is the command

shp2pgsql -d -s 4326:2263 -g geom -e -I fatality_yearly nystaging.fatality > $scriptdir/023_fatality_yearly.sql

shp2pgsql -d -s 4326:2263 -g geom -e -I injury_yearly nystaging.injury > $scriptdir/024_injury_yearly.sql


That's it...

Here is the complete script...

# 04_load_spatial_nyc_data.sh

export datadir=../data/shpfiles
export scriptdir=../../scripts

export curdir=~/Work/Transportation/NYC/scripts
cd $curdir

export PGUSER=nyc
export PGPASSWORD=nyc
export PGDATABASE=nyc


echo $datadir
echo $scriptdir

cd $datadir

gdalsrsinfo nybbwi.prj
# EPSG code for NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet is 2263
shp2pgsql -d -s 2263 -g geom -e -I nybbwi nystaging.boroughs > $scriptdir/021_boroughs.sql

gdalsrsinfo PedCountLocationsMay2015.prj
ogrinfo -al -so PedCountLocationsMay2015.shp
shp2pgsql -d -s 2263 -g geom -e -I PedCountLocationsMay2015 nystaging.pedcounts > $scriptdir/022_pedcounts.sql

ogrinfo -al -so fatality_yearly.shp
shp2pgsql -d -s 4326:2263 -g geom -e -I fatality_yearly nystaging.fatality > $scriptdir/023_fatality_yearly.sql
shp2pgsql -d -s 4326:2263 -g geom -e -I injury_yearly nystaging.injury > $scriptdir/024_injury_yearly.sql

cd $scriptdir

psql -f 021_boroughs.sql
psql -f 022_pedcounts.sql
psql -f 023_fatality_yearly.sql
psql -f 024_injury_yearly.sql


Monday, September 7, 2015

Connecting to PostGIS from R

In this post, we will connect to spatial datasets loaded in PostGIS from within R.

First I have set environmental variables to connect to PostgreSQL in my environment. These are as follows

export PGUSER=nyc
export PGPASSWORD=nyc
export PGDATABASE=nyc

Next, we will launch R, and load the required library, rgdal. If you don't have rgdal installed, please refer to initial steps of my earlier post.


Now lets get started
# 05a_connect_ped_postgis.r
# Load required library
library(rgdal)

This loads the library


Next we read the PostgreSQL connection parameters from the shell environment using Sys.getenv command

#Set PostgreSQL connection parameters from the environment
pguser<-Sys.getenv('PGUSER')
pgpassword<-Sys.getenv('PGPASSWORD')
pgdatabase<-Sys.getenv('PGDATABASE')


Next we create the database connection by concatenating all variables into a connection string using the paste function in R , and we print to confirm we have it right.

#Make connection to PostGIS database
dsn<-paste("PG:dbname='",pgdatabase,"' user='",pguser,"' password='",pgpassword,"'",sep='')
dsn


Now we can use the ogrListLayers command to make sure we can see our previously loaded data.

ogrListLayers(dsn)


The command makes a connection to the database and lists all layers registered with PostGIS. Now we can also look up the details for a specific layer of our choice using ogrInfo function

ogrInfo(dsn, "nystaging.fatality")


Here is the complete listing..

# 05a_connect_ped_postgis.r
# Load required library
library(rgdal)

#Set PostgreSQL connection parameters from the environment
pguser<-Sys.getenv('PGUSER')
pgpassword<-Sys.getenv('PGPASSWORD')
pgdatabase<-Sys.getenv('PGDATABASE')

#Make connection to PostGIS database
dsn<-paste("PG:dbname='",pgdatabase,"' user='",pguser,"' password='",pgpassword,"'",sep='')
dsn
ogrListLayers(dsn)
ogrInfo(dsn, "nystaging.fatality")

Saturday, September 5, 2015

Installing Tiger Geocoder on PostGIS 2.1 on Ubuntu using a single shell script

In this post, I am going to walk through the steps of installing Tiger Geocoder on Ubuntu. These are the steps that worked for me and are collated from multiple sources and forum postings. My intent was to create one script that did everything from start to finish (included at the end), that took a while to perfect.

In this post, I will walk through the script, so you can modify it for your environment. Lets get started.

First step is to set some environmental variables that we will use. This way we don't have to enter this over and over again. As you can see, I plan to install the database on a database called tiger. geocoder seems to be another popular alternative.

export PGUSER=arthgallo
export PGPASSWORD=*****
export PGDATABASE=tiger

Next step is to enter the states you want to load data for. Please enter it here in the format below

#Each state needs to be quoted with single quotes. The following construct
#preserves this arrangement
export geostates="'"NY"'","'"MA"'"
echo $geostates

Next we create the database

createdb tiger

Now lets install the appropriate extensions including PostGIS. If PostGIS is not installed or is not the latest version, please refer to my previous posts. Once we install the Postgis Tiger Geocoder extension, we will also validate if the address can be normalized. Running address normalization only validates that the extension is installed. We are still not done as setting up the actual geocoder involves downloading a lot of data and it takes a fair bit of time based on your internet connection and your processing speed.

psql -c "CREATE EXTENSION postgis"
psql -c "CREATE EXTENSION postgis_topology"
psql -c "CREATE EXTENSION fuzzystrmatch"
psql -c "DROP EXTENSION postgis_tiger_geocoder CASCADE"
psql -c "CREATE EXTENSION postgis_tiger_geocoder"

psql -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
 FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na"

Next, we need to set up the automated process that will download the Tiger data from census website and deploy the appropriate geocoders. I have created a SQL script file that does some of the work. It makes the appropriate entries into a table for setting it up according to my environment.

As you can see I have commented out the PGPORT, PGHOST, PGUSER and PGPASSWORD entries since I had already set the user at the beginning of the script, and I want to continue with the same.

Please modify it to match your environment.

-- tiger_loader_platform.sql
INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, 
loader, environ_set_command, county_process_command)
 SELECT 'trusty', 'TMPDIR="${staging_fold}/temp/"
 UNZIPTOOL=unzip                   
 WGETTOOL=/usr/bin/wget          
 export PGBIN=/usr/bin   
 # export PGPORT=5432                
 # export PGHOST=localhost           
 # export PGUSER=postgres            
 # export PGPASSWORD=yourpasswordhere
 export PGDATABASE=tiger        
 PSQL=${PGBIN}/psql                
 SHP2PGSQL=${PGBIN}/shp2pgsql      
 cd ${staging_fold}', pgbin, wget, unzip_command, psql, path_sep, 
 loader, environ_set_command, county_process_command
 FROM tiger.loader_platform
 WHERE os = 'sh';

Next we will execute the script to create appropriate entries.

psql  -f tiger_loader_platform.sql

We can test that everything is inserted correctly..

psql -c "select declare_sect from tiger.loader_platform where os='trusty'"


Next we will create a folder to stage the data downloaded from the census ftp site. Please make sure you have enough disk space on your machine. My thumb rule for the process is 1-2 GB per state that you want to setup.

We create a folder called gisdata at root. This is important since the out of the box geocoder install scripts all point to this folder.

# script needs gisdata folder
sudo mkdir /gisdata

We will also need a temp folder inside the gisdata folder.

# and a temp folder inside it
sudo mkdir /gisdata/temp

Next we need to make this folder accessible to the postgres user.

# Need to make it accessible to the postgres user
sudo chmod -R 777 /gisdata

Now lets restart postgres before the fun begins.

#Restart postgres service
sudo service postgresql restart

First step is to generate a script called loader_generate_nation_script. I skipped this step and had to repeat the process multiple times (many wasted hours) before figuring it out. Since this is a SQL command we need to make sure the output is clean enough to be executed, hence some extra formatting commands on the SQL. The default SQL output puts the column name in the output. I renamed it to 'aaaaa' so that I could get rid of it later.


#Run the nation script
psql -c "SELECT loader_generate_nation_script('trusty') AS aaaaa" \
-t -x -A -F "" -q -o 02a_load_nation_tiger.sh

Now that the script is generated, we need to replace aaaaa with blank to make it a proper shell script. We also need to make it executable. Then we need to execute it...

#Replace the text
sed -i 's/aaaaa//g' 02a_load_nation_tiger.sh
chmod a+x ./02a_load_nation_tiger.sh
sudo su postgres ./02a_load_nation_tiger.sh

Next the actual script for loading specific states. I just needed data for New York and Massachusetts. You can modify this to match your criteria.


#Now generate the loader script
psql -c "SELECT loader_generate_script(ARRAY[$geostates], 'trusty') AS aaaaa" -t -x -A -F "" -q -o 02b_load_ny_tiger.sh

Again, we need to replace the dummy text with blanks using sed

#Replace dummy text with blanks
sed -i 's/aaaaa//g' 02b_load_ny_tiger.sh

We need to make the file executable followed by executing the file. Beware this process takes a while to run.

#Execute the load_ny_tiger file we generated
chmod a+x ./02b_load_ny_tiger.sh
sudo su postgres ./02b_load_ny_tiger.sh

The previous step takes a few hours to run depending on the data that needs to be downloaded. We run the following command to install missing indexes.

psql -c "SELECT install_missing_indexes()" 


Now that everything is setup, we can run individual queries. First step is to normalize an address

psql -c "SELECT '30 Rock' As event, n.* FROM normalize_address('45 Rockefeller Plaza, New York, NY 10111') As n"

Here is the output


Next the real test... We try to geocode an address.


psql -c "SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, 
(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('45 Rockefeller Plaza, New York, NY 10111') As g"

Perfect, we found some matches as shown in the screenshot below



Next an intersection geocode

psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn','11219',1)" 


Here is the result


Next without the zipcode

psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn')" 

Here is the result


Not bad...  Here is the complete script for the shell file and SQL script.

First the SQL script

-- tiger_loader_platform.sql
INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, 
loader, environ_set_command, county_process_command)
 SELECT 'trusty', 'TMPDIR="${staging_fold}/temp/"
 UNZIPTOOL=unzip                   
 WGETTOOL=/usr/bin/wget          
 export PGBIN=/usr/bin   
 # export PGPORT=5432                
 # export PGHOST=localhost           
 # export PGUSER=postgres            
 # export PGPASSWORD=yourpasswordhere
 export PGDATABASE=tiger        
 PSQL=${PGBIN}/psql                
 SHP2PGSQL=${PGBIN}/shp2pgsql      
 cd ${staging_fold}', pgbin, wget, unzip_command, psql, path_sep, 
 loader, environ_set_command, county_process_command
 FROM tiger.loader_platform
 WHERE os = 'sh';

Next the shell script


# 02_install_tiger_geocoder.sh

export PGUSER=arthgallo
export PGPASSWORD=*****
export PGDATABASE=tiger

#Each state needs to be quoted with single quotes. The following construct
#preserves this arrangement
export geostates="'"NY"'","'"MA"'"
echo $geostates

createdb tiger
 
psql -c "CREATE EXTENSION postgis"
psql -c "CREATE EXTENSION postgis_topology"
psql -c "CREATE EXTENSION fuzzystrmatch"
psql -c "DROP EXTENSION postgis_tiger_geocoder CASCADE"
psql -c "CREATE EXTENSION postgis_tiger_geocoder"

psql -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
 FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na"

psql -c "delete from tiger.loader_platform where os='trusty'"

psql  -f tiger_loader_platform.sql

psql -c "select declare_sect from tiger.loader_platform where os='trusty'"

# script needs gisdata folder
sudo mkdir /gisdata

# and a temp folder inside it
sudo mkdir /gisdata/temp

# Need to make it accessible to the postgres user
sudo chmod -R 777 /gisdata

#Restart postgres service
sudo service postgresql restart

#Run the nation script
psql -c "SELECT loader_generate_nation_script('trusty') AS aaaaa" \
-t -x -A -F "" -q -o 02a_load_nation_tiger.sh

#Replace the text
sed -i 's/aaaaa//g' 02a_load_nation_tiger.sh
chmod a+x ./02a_load_nation_tiger.sh
sudo su postgres ./02a_load_nation_tiger.sh

#Now generate the loader script
psql -c "SELECT loader_generate_script(ARRAY[$geostates], 'trusty') AS aaaaa" -t -x -A -F "" -q -o 02b_load_ny_tiger.sh

#Replace dummy text with blanks
sed -i 's/aaaaa//g' 02b_load_ny_tiger.sh

#Execute the load_ny_tiger file we generated
chmod a+x ./02b_load_ny_tiger.sh
sudo su postgres ./02b_load_ny_tiger.sh

psql -c "SELECT install_missing_indexes()" 

#Test address normalization
psql -c "SELECT '30 Rock' As event, n.* FROM normalize_address('45 Rockefeller Plaza, New York, NY 10111') As n"

#Test simple geocode
psql -c "SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, 
(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('45 Rockefeller Plaza, New York, NY 10111') As g"

#Test intersection geocode
psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn','11219',1)" 

#Test intersection geocode without zip
psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn')"