Search This Blog

Wednesday, March 12, 2014

Generating time dependent discrete events using R

In the previous post, we had looked at generating a uniform set of discrete arrivals using R statistical package and the rpois function within R. We will use the example of cars arriving in a parking lot. Instead of a uniform distribution, we will consider a time varying distribution with two peaks.

At a high level, these are the steps..

1. Download and Install package for non-homogenous poisson distribution
2. Generate a bi-modal distribution for car arrivals through a day
3. Generate a non homogenous poisson event data using the bimodal distribution as an input.

To generate time-dependent arrival data, we need to use a different package that is now available in R as a library.

Downloading and Installing NHPoisson package

The package is called NHPoisson. To install it, we need to run R with administrative privileges. In Ubuntu, the command will be

$ sudo R

Enter the password to continue.



Next we need to install the NHPoisson package. We can do so by entering the command NHPoisson

install.packages(“NHPoisson”)



This brings up the list of mirrors to select the package from. I selected my preferred mirror



Once the mirror is selected, the package is downloaded and installed on the machine.


Generating a bimodal distribution of events

Now we would like to generate a bimodal distribution for car arrivals. In our scenario, we have two peaks, first at 9:00 AM and other at 5:30 PM. Assuming 3 cars per minute on average, we can assume that during first peak, the standard deviation is 2, as morning peak is more pronounced. The standard deviation during the second peak can be assumed at 3 as evening peaks can be more dispersed.

Using the above assumptions, we can run the following script in R. If you launched R as administrator to download and install packages, it is better to quit and come back in as a non-administrative user (without sudo).



To create the bimodal dataset, we will create a function (shown below) that generates a bimodal distribution with supplied means and standard deviations.


mu1 <- 9 # morning peak
mu2 <- 17.5 # evening peak
sig1 <- 2 # standard deviation for morning
sig2 <- 3 # standard deviation for evening
cpct <- 0.48 # probability


bimodalDistFunc <- function (n,cpct, mu1, mu2, sig1, sig2) {
y0 <- rlnorm(n,mean=mu1, sd = sig1)
y1 <- rlnorm(n,mean=mu2, sd = sig2)

flag <-rbinom(n,size=1,prob=cpct)
y <- y0*(1 - flag) + y1*flag
}

# assuming average arrival rate of 3 cars per minute for the overall duration and assuming 24 hours of data to be generated
bimodalData <- bimodalDistFunc(n=3*60*24,cpct,mu1,mu2, sig1,sig2)
hist(log(bimodalData))



The last command generates a histogram for the generated distribution.




We can plot per minute arrivals, using the following formula

hist(log(bimodalData), breaks=24*60)

Running the above command, we can see the following graph generated.



Generating Non-homogenous Poisson time events

Now we can generate the data we need for the non homogenous poisson distribution.

First we need to load the NHPoisson library, if not already loaded. Also, what we really need for the poisson events is the per minute frequency of arrivals that needs to be fed in. For this, we will use the table function.

library(“NHPoisson”)
arrFreq<-table(round(log(bimodalData)*60))

arrEvent=simNHP.fun(lambda=arrFreq[])



If we plot the generated data using the following command

hist(arrEvent$posNH, breaks=24*60)

we get the following chart.



That is it, we have generated a non-homogenous poisson event. If we want to see the actual data, we can see using the following command. This shows the arrival events of individual cars.

arrEvent$posNH



I am sure there is room for improvement, but it is an adequate structure to get started.



No comments: