Search This Blog

Saturday, September 5, 2015

Installing Tiger Geocoder on PostGIS 2.1 on Ubuntu using a single shell script

In this post, I am going to walk through the steps of installing Tiger Geocoder on Ubuntu. These are the steps that worked for me and are collated from multiple sources and forum postings. My intent was to create one script that did everything from start to finish (included at the end), that took a while to perfect.

In this post, I will walk through the script, so you can modify it for your environment. Lets get started.

First step is to set some environmental variables that we will use. This way we don't have to enter this over and over again. As you can see, I plan to install the database on a database called tiger. geocoder seems to be another popular alternative.

export PGUSER=arthgallo
export PGPASSWORD=*****
export PGDATABASE=tiger

Next step is to enter the states you want to load data for. Please enter it here in the format below

#Each state needs to be quoted with single quotes. The following construct
#preserves this arrangement
export geostates="'"NY"'","'"MA"'"
echo $geostates

Next we create the database

createdb tiger

Now lets install the appropriate extensions including PostGIS. If PostGIS is not installed or is not the latest version, please refer to my previous posts. Once we install the Postgis Tiger Geocoder extension, we will also validate if the address can be normalized. Running address normalization only validates that the extension is installed. We are still not done as setting up the actual geocoder involves downloading a lot of data and it takes a fair bit of time based on your internet connection and your processing speed.

psql -c "CREATE EXTENSION postgis"
psql -c "CREATE EXTENSION postgis_topology"
psql -c "CREATE EXTENSION fuzzystrmatch"
psql -c "DROP EXTENSION postgis_tiger_geocoder CASCADE"
psql -c "CREATE EXTENSION postgis_tiger_geocoder"

psql -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
 FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na"

Next, we need to set up the automated process that will download the Tiger data from census website and deploy the appropriate geocoders. I have created a SQL script file that does some of the work. It makes the appropriate entries into a table for setting it up according to my environment.

As you can see I have commented out the PGPORT, PGHOST, PGUSER and PGPASSWORD entries since I had already set the user at the beginning of the script, and I want to continue with the same.

Please modify it to match your environment.

-- tiger_loader_platform.sql
INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, 
loader, environ_set_command, county_process_command)
 SELECT 'trusty', 'TMPDIR="${staging_fold}/temp/"
 UNZIPTOOL=unzip                   
 WGETTOOL=/usr/bin/wget          
 export PGBIN=/usr/bin   
 # export PGPORT=5432                
 # export PGHOST=localhost           
 # export PGUSER=postgres            
 # export PGPASSWORD=yourpasswordhere
 export PGDATABASE=tiger        
 PSQL=${PGBIN}/psql                
 SHP2PGSQL=${PGBIN}/shp2pgsql      
 cd ${staging_fold}', pgbin, wget, unzip_command, psql, path_sep, 
 loader, environ_set_command, county_process_command
 FROM tiger.loader_platform
 WHERE os = 'sh';

Next we will execute the script to create appropriate entries.

psql  -f tiger_loader_platform.sql

We can test that everything is inserted correctly..

psql -c "select declare_sect from tiger.loader_platform where os='trusty'"


Next we will create a folder to stage the data downloaded from the census ftp site. Please make sure you have enough disk space on your machine. My thumb rule for the process is 1-2 GB per state that you want to setup.

We create a folder called gisdata at root. This is important since the out of the box geocoder install scripts all point to this folder.

# script needs gisdata folder
sudo mkdir /gisdata

We will also need a temp folder inside the gisdata folder.

# and a temp folder inside it
sudo mkdir /gisdata/temp

Next we need to make this folder accessible to the postgres user.

# Need to make it accessible to the postgres user
sudo chmod -R 777 /gisdata

Now lets restart postgres before the fun begins.

#Restart postgres service
sudo service postgresql restart

First step is to generate a script called loader_generate_nation_script. I skipped this step and had to repeat the process multiple times (many wasted hours) before figuring it out. Since this is a SQL command we need to make sure the output is clean enough to be executed, hence some extra formatting commands on the SQL. The default SQL output puts the column name in the output. I renamed it to 'aaaaa' so that I could get rid of it later.


#Run the nation script
psql -c "SELECT loader_generate_nation_script('trusty') AS aaaaa" \
-t -x -A -F "" -q -o 02a_load_nation_tiger.sh

Now that the script is generated, we need to replace aaaaa with blank to make it a proper shell script. We also need to make it executable. Then we need to execute it...

#Replace the text
sed -i 's/aaaaa//g' 02a_load_nation_tiger.sh
chmod a+x ./02a_load_nation_tiger.sh
sudo su postgres ./02a_load_nation_tiger.sh

Next the actual script for loading specific states. I just needed data for New York and Massachusetts. You can modify this to match your criteria.


#Now generate the loader script
psql -c "SELECT loader_generate_script(ARRAY[$geostates], 'trusty') AS aaaaa" -t -x -A -F "" -q -o 02b_load_ny_tiger.sh

Again, we need to replace the dummy text with blanks using sed

#Replace dummy text with blanks
sed -i 's/aaaaa//g' 02b_load_ny_tiger.sh

We need to make the file executable followed by executing the file. Beware this process takes a while to run.

#Execute the load_ny_tiger file we generated
chmod a+x ./02b_load_ny_tiger.sh
sudo su postgres ./02b_load_ny_tiger.sh

The previous step takes a few hours to run depending on the data that needs to be downloaded. We run the following command to install missing indexes.

psql -c "SELECT install_missing_indexes()" 


Now that everything is setup, we can run individual queries. First step is to normalize an address

psql -c "SELECT '30 Rock' As event, n.* FROM normalize_address('45 Rockefeller Plaza, New York, NY 10111') As n"

Here is the output


Next the real test... We try to geocode an address.


psql -c "SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, 
(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('45 Rockefeller Plaza, New York, NY 10111') As g"

Perfect, we found some matches as shown in the screenshot below



Next an intersection geocode

psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn','11219',1)" 


Here is the result


Next without the zipcode

psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn')" 

Here is the result


Not bad...  Here is the complete script for the shell file and SQL script.

First the SQL script

-- tiger_loader_platform.sql
INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, 
loader, environ_set_command, county_process_command)
 SELECT 'trusty', 'TMPDIR="${staging_fold}/temp/"
 UNZIPTOOL=unzip                   
 WGETTOOL=/usr/bin/wget          
 export PGBIN=/usr/bin   
 # export PGPORT=5432                
 # export PGHOST=localhost           
 # export PGUSER=postgres            
 # export PGPASSWORD=yourpasswordhere
 export PGDATABASE=tiger        
 PSQL=${PGBIN}/psql                
 SHP2PGSQL=${PGBIN}/shp2pgsql      
 cd ${staging_fold}', pgbin, wget, unzip_command, psql, path_sep, 
 loader, environ_set_command, county_process_command
 FROM tiger.loader_platform
 WHERE os = 'sh';

Next the shell script


# 02_install_tiger_geocoder.sh

export PGUSER=arthgallo
export PGPASSWORD=*****
export PGDATABASE=tiger

#Each state needs to be quoted with single quotes. The following construct
#preserves this arrangement
export geostates="'"NY"'","'"MA"'"
echo $geostates

createdb tiger
 
psql -c "CREATE EXTENSION postgis"
psql -c "CREATE EXTENSION postgis_topology"
psql -c "CREATE EXTENSION fuzzystrmatch"
psql -c "DROP EXTENSION postgis_tiger_geocoder CASCADE"
psql -c "CREATE EXTENSION postgis_tiger_geocoder"

psql -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
 FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na"

psql -c "delete from tiger.loader_platform where os='trusty'"

psql  -f tiger_loader_platform.sql

psql -c "select declare_sect from tiger.loader_platform where os='trusty'"

# script needs gisdata folder
sudo mkdir /gisdata

# and a temp folder inside it
sudo mkdir /gisdata/temp

# Need to make it accessible to the postgres user
sudo chmod -R 777 /gisdata

#Restart postgres service
sudo service postgresql restart

#Run the nation script
psql -c "SELECT loader_generate_nation_script('trusty') AS aaaaa" \
-t -x -A -F "" -q -o 02a_load_nation_tiger.sh

#Replace the text
sed -i 's/aaaaa//g' 02a_load_nation_tiger.sh
chmod a+x ./02a_load_nation_tiger.sh
sudo su postgres ./02a_load_nation_tiger.sh

#Now generate the loader script
psql -c "SELECT loader_generate_script(ARRAY[$geostates], 'trusty') AS aaaaa" -t -x -A -F "" -q -o 02b_load_ny_tiger.sh

#Replace dummy text with blanks
sed -i 's/aaaaa//g' 02b_load_ny_tiger.sh

#Execute the load_ny_tiger file we generated
chmod a+x ./02b_load_ny_tiger.sh
sudo su postgres ./02b_load_ny_tiger.sh

psql -c "SELECT install_missing_indexes()" 

#Test address normalization
psql -c "SELECT '30 Rock' As event, n.* FROM normalize_address('45 Rockefeller Plaza, New York, NY 10111') As n"

#Test simple geocode
psql -c "SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, 
(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('45 Rockefeller Plaza, New York, NY 10111') As g"

#Test intersection geocode
psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn','11219',1)" 

#Test intersection geocode without zip
psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn')"

3 comments:

ross said...

I want to get this to work so badly but I can't figure out which user I should run this from. If I try to use my home user, I get "permission denied to create database." If I run it using sudo, I get "per authentication failed for user myUser."

ross said...

I want to get this to work so badly but I can't figure out which user to run the script from. It won't allow me, saying either "permission denied..." or "peer authentication failed.."

awachs said...

Hi Ross:

Is your user part of the sudoers group?

Thanks