Search This Blog

Sunday, February 23, 2014

Connecting to PostgreSQL database to do R analysis

In the previous post, we saw how to install R on Ubuntu. To perform analysis on PostgreSQL data, we need to have the RPostgreSQL library installed. Installing on Ubuntu is simple, and you need to follow the scripts listed below.

Install libpq-dev library

First step is to ensure that the libpq-dev library is installed. This can be installed by entering the following command.

$ sudo apt-get install libpq-dev




Install RPostgreSQL Library


Now that we have the library installed, we need to setup PostgreSQL library for R on our machine. To do this, we need to start R as a super user.

$ sudo R




Next step is to run the command to install additional packages. Enter the following command on the R prompt.

> install.packages()



This prompts a bunch of mirrors from where new packages can be downloaded from. Choose your location from the list.




Choose the appropriate mirror for your location and continue...

Next, we need to choose the package. Choose RPostgreSQL, as shown below



Clicking OK will start the download and installation.


Now let's see if we can connect to a database, and use the R library.

To do so, we need to load the RPostgreSQL library, load the driver for PostgreSQL and make a connection to our database with an authorized user and password.

Here are the steps to do so....
> library(RPostgreSQL)
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, user = "nydot", password = "nydot", dbname = "nydot", host = "localhost", port = 5432)
This loads the library and connects to the database as shown in our terminal window.



Now that the connection is made, I can fire a query to load the data. You do so by using the command dbGetQuery as shown in the following statement.

> rs <- dbGetQuery(con, "select rcstation, start_time, direction, lane, count from traffic")

This assigns the results of the query to our object.


Just typing rs on the console will list all the data that got loaded.

> rs

Will list all contents of the object containing the results of the query. If we now wanted to check the mean count of all the data in the results we could do so as follows:

> mean(rs$count)

Here mean() is the function that we want to execute that calculates the average; rs is the object holding our results and rs$count refers to the field “count” in our resultset.



That's it.... We saw how we can calculate statistics from datasets in a PostgreSQL database.




Saturday, February 22, 2014

Download and Install JDK on Ubuntu

I just setup a new laptop with Ubuntu 13.10. As I wanted to run some Java applications, I tried to see the version of javac on my machine, only to realize that nothing was installed.




Now the question, which JDK to install. I realized the best option was openjdk-7-jdk (To install Sun/ Oracle JDK, refer to the steps in this post). So I ran the following command

$sudo apt-get install openjdk-7-jdk



This should start the installation process. After installation is complete, you should see a screen indicating the process completed successfully.


Finally, we can validate by checking the version of the Java compiler.



Perfect, we have JDK installed!

If you have multiple JDKs installed, you can read this post to configure the one you need as default.

Friday, February 21, 2014

Installing R Statistical Package on Ubuntu

R is an open-source statistical package like SPSS or SAS. In this post, we will see how to install it on Ubuntu and how to access the demo scripts.

Getting the packages

Installing R is fairly straightforward on Ubuntu. Simply, open a terminal window (using Ctrl-Alt-T, or from the menu), and enter the following command....

$ sudo apt-get install r-base r-base-dev



The system prompts to install the necessary R libraries as shown below.


Running R for the first time

Once R is installed, simply type R at the command prompt, as follows:



That brings up the R prompt that can be used for statistical analysis. We can do a small test right here.

Type demo(lm.glm) to see some examples of Generalized Linear Regression.



Entering return brings multiple examples. The following screen shows the second example.



The script keeps prompting additional examples, and one needs to press Enter to see the next example.



One can enter, ^C to exit the demo and q() to exit the R prompt.



That's it, we are done!

Happy modeling, everyone!