In this post, I will walk through the script, so you can modify it for your environment. Lets get started.
First step is to set some environmental variables that we will use. This way we don't have to enter this over and over again. As you can see, I plan to install the database on a database called tiger. geocoder seems to be another popular alternative.
export PGUSER=arthgallo export PGPASSWORD=***** export PGDATABASE=tiger
Next step is to enter the states you want to load data for. Please enter it here in the format below
#Each state needs to be quoted with single quotes. The following construct #preserves this arrangement export geostates="'"NY"'","'"MA"'" echo $geostates
Next we create the database
createdb tiger
Now lets install the appropriate extensions including PostGIS. If PostGIS is not installed or is not the latest version, please refer to my previous posts. Once we install the Postgis Tiger Geocoder extension, we will also validate if the address can be normalized. Running address normalization only validates that the extension is installed. We are still not done as setting up the actual geocoder involves downloading a lot of data and it takes a fair bit of time based on your internet connection and your processing speed.
psql -c "CREATE EXTENSION postgis" psql -c "CREATE EXTENSION postgis_topology" psql -c "CREATE EXTENSION fuzzystrmatch" psql -c "DROP EXTENSION postgis_tiger_geocoder CASCADE" psql -c "CREATE EXTENSION postgis_tiger_geocoder" psql -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na"
Next, we need to set up the automated process that will download the Tiger data from census website and deploy the appropriate geocoders. I have created a SQL script file that does some of the work. It makes the appropriate entries into a table for setting it up according to my environment.
As you can see I have commented out the PGPORT, PGHOST, PGUSER and PGPASSWORD entries since I had already set the user at the beginning of the script, and I want to continue with the same.
Please modify it to match your environment.
-- tiger_loader_platform.sql INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command) SELECT 'trusty', 'TMPDIR="${staging_fold}/temp/" UNZIPTOOL=unzip WGETTOOL=/usr/bin/wget export PGBIN=/usr/bin # export PGPORT=5432 # export PGHOST=localhost # export PGUSER=postgres # export PGPASSWORD=yourpasswordhere export PGDATABASE=tiger PSQL=${PGBIN}/psql SHP2PGSQL=${PGBIN}/shp2pgsql cd ${staging_fold}', pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command FROM tiger.loader_platform WHERE os = 'sh';
Next we will execute the script to create appropriate entries.
psql -f tiger_loader_platform.sql
We can test that everything is inserted correctly..
psql -c "select declare_sect from tiger.loader_platform where os='trusty'"
Next we will create a folder to stage the data downloaded from the census ftp site. Please make sure you have enough disk space on your machine. My thumb rule for the process is 1-2 GB per state that you want to setup.
We create a folder called gisdata at root. This is important since the out of the box geocoder install scripts all point to this folder.
# script needs gisdata folder sudo mkdir /gisdata
We will also need a temp folder inside the gisdata folder.
# and a temp folder inside it sudo mkdir /gisdata/temp
Next we need to make this folder accessible to the postgres user.
# Need to make it accessible to the postgres user sudo chmod -R 777 /gisdata
Now lets restart postgres before the fun begins.
#Restart postgres service sudo service postgresql restart
First step is to generate a script called loader_generate_nation_script. I skipped this step and had to repeat the process multiple times (many wasted hours) before figuring it out. Since this is a SQL command we need to make sure the output is clean enough to be executed, hence some extra formatting commands on the SQL. The default SQL output puts the column name in the output. I renamed it to 'aaaaa' so that I could get rid of it later.
#Run the nation script psql -c "SELECT loader_generate_nation_script('trusty') AS aaaaa" \ -t -x -A -F "" -q -o 02a_load_nation_tiger.sh
Now that the script is generated, we need to replace aaaaa with blank to make it a proper shell script. We also need to make it executable. Then we need to execute it...
#Replace the text sed -i 's/aaaaa//g' 02a_load_nation_tiger.sh chmod a+x ./02a_load_nation_tiger.sh sudo su postgres ./02a_load_nation_tiger.sh
Next the actual script for loading specific states. I just needed data for New York and Massachusetts. You can modify this to match your criteria.
#Now generate the loader script psql -c "SELECT loader_generate_script(ARRAY[$geostates], 'trusty') AS aaaaa" -t -x -A -F "" -q -o 02b_load_ny_tiger.sh
Again, we need to replace the dummy text with blanks using sed
#Replace dummy text with blanks sed -i 's/aaaaa//g' 02b_load_ny_tiger.sh
We need to make the file executable followed by executing the file. Beware this process takes a while to run.
#Execute the load_ny_tiger file we generated chmod a+x ./02b_load_ny_tiger.sh sudo su postgres ./02b_load_ny_tiger.sh
The previous step takes a few hours to run depending on the data that needs to be downloaded. We run the following command to install missing indexes.
psql -c "SELECT install_missing_indexes()"
Now that everything is setup, we can run individual queries. First step is to normalize an address
psql -c "SELECT '30 Rock' As event, n.* FROM normalize_address('45 Rockefeller Plaza, New York, NY 10111') As n"
Here is the output
Next the real test... We try to geocode an address.
psql -c "SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('45 Rockefeller Plaza, New York, NY 10111') As g"
Perfect, we found some matches as shown in the screenshot below
Next an intersection geocode
psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn','11219',1)"
Here is the result
Next without the zipcode
psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn')"
Here is the result
Not bad... Here is the complete script for the shell file and SQL script.
First the SQL script
-- tiger_loader_platform.sql INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command) SELECT 'trusty', 'TMPDIR="${staging_fold}/temp/" UNZIPTOOL=unzip WGETTOOL=/usr/bin/wget export PGBIN=/usr/bin # export PGPORT=5432 # export PGHOST=localhost # export PGUSER=postgres # export PGPASSWORD=yourpasswordhere export PGDATABASE=tiger PSQL=${PGBIN}/psql SHP2PGSQL=${PGBIN}/shp2pgsql cd ${staging_fold}', pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command FROM tiger.loader_platform WHERE os = 'sh';
Next the shell script
# 02_install_tiger_geocoder.sh export PGUSER=arthgallo export PGPASSWORD=***** export PGDATABASE=tiger #Each state needs to be quoted with single quotes. The following construct #preserves this arrangement export geostates="'"NY"'","'"MA"'" echo $geostates createdb tiger psql -c "CREATE EXTENSION postgis" psql -c "CREATE EXTENSION postgis_topology" psql -c "CREATE EXTENSION fuzzystrmatch" psql -c "DROP EXTENSION postgis_tiger_geocoder CASCADE" psql -c "CREATE EXTENSION postgis_tiger_geocoder" psql -c "SELECT na.address, na.streetname,na.streettypeabbrev, na.zip FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na" psql -c "delete from tiger.loader_platform where os='trusty'" psql -f tiger_loader_platform.sql psql -c "select declare_sect from tiger.loader_platform where os='trusty'" # script needs gisdata folder sudo mkdir /gisdata # and a temp folder inside it sudo mkdir /gisdata/temp # Need to make it accessible to the postgres user sudo chmod -R 777 /gisdata #Restart postgres service sudo service postgresql restart #Run the nation script psql -c "SELECT loader_generate_nation_script('trusty') AS aaaaa" \ -t -x -A -F "" -q -o 02a_load_nation_tiger.sh #Replace the text sed -i 's/aaaaa//g' 02a_load_nation_tiger.sh chmod a+x ./02a_load_nation_tiger.sh sudo su postgres ./02a_load_nation_tiger.sh #Now generate the loader script psql -c "SELECT loader_generate_script(ARRAY[$geostates], 'trusty') AS aaaaa" -t -x -A -F "" -q -o 02b_load_ny_tiger.sh #Replace dummy text with blanks sed -i 's/aaaaa//g' 02b_load_ny_tiger.sh #Execute the load_ny_tiger file we generated chmod a+x ./02b_load_ny_tiger.sh sudo su postgres ./02b_load_ny_tiger.sh psql -c "SELECT install_missing_indexes()" #Test address normalization psql -c "SELECT '30 Rock' As event, n.* FROM normalize_address('45 Rockefeller Plaza, New York, NY 10111') As n" #Test simple geocode psql -c "SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('45 Rockefeller Plaza, New York, NY 10111') As g" #Test intersection geocode psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn','11219',1)" #Test intersection geocode without zip psql -c "SELECT pprint_addy(addy), st_astext(geomout),rating FROM geocode_intersection( 'Fort Hamilton Pkwy','50th Street', 'NY', 'Brooklyn')"
3 comments:
I want to get this to work so badly but I can't figure out which user I should run this from. If I try to use my home user, I get "permission denied to create database." If I run it using sudo, I get "per authentication failed for user myUser."
I want to get this to work so badly but I can't figure out which user to run the script from. It won't allow me, saying either "permission denied..." or "peer authentication failed.."
Hi Ross:
Is your user part of the sudoers group?
Thanks
Post a Comment