Saturday, 23 March 2013

Session 9, 19th March, 2013


Many Eyes is a simple and easy to use service on the web that let us visualize and explore data. This service is offered by International Business Machine.

On Many Eyes we can:
    1. View and discuss visualizations
    2. View and discuss data sets
    3. Create visualizations from existing data sets
Any kind of data set that we upload is visible for the community to watch for.

Steps to be followed:
  • We need to go to the site : http://www-958.ibm.com/software/analytics/manyeyes/register.
  • Register their with a valid email id. Once we are registered, we need to login with our credentials.
  • Under the participate tab, we need to choose create visualization.

  • After that we have to start with a data set. We have two choices: To use existing data set or to upload our own data set.
If our data is a list of values, first format we need to add a table with informative column headers. If our columns have different units of measure, we must be sure to include those units in the headers tabs. Then we need to use an spreadsheet program such as the Microsoft ExcelTM or a text file where columns are separated with tabs.

I have chosen the following data set  for the purpose of explanation:
Date
ITC(Rs.)
RELIANCE(Rs.)
SBIN(Rs.)
TATASTEEL(Rs.)
3/12/2012
298.3
807
2210
393.45
4/12/2012
296.2
825.7
2249.2
392.35
5/12/2012
298.95
837.1
2283
399.9
6/12/2012
302.2
845.85
2317.95
403.9
7/12/2012
303
849.7
2339.6
403.95
10/12/2012
302.65
837.95
2326.8
400.9
11/12/2012
305.7
838.35
2338.55
402.9
12/12/2012
306.5
834.8
2317.95
400.3
13-12-2012
305.4
841
2325
397.65
14-12-2012
296.9
846.8
2327
400.05
17-12-2012
295
844.4
2349
402.4
18-12-2012
296
839.45
2382
416.1
19-12-2012
295.4
843.05
2408.15
424.9
20-12-2012
290.1
841.95
2398.8
433.6
21-12-2012
289.45
834.5
2376.65
436.5
24-12-2012
290.75
827.35
2359
435.9
26-12-2012
291
835.9
2380
433.8
27-12-2012
292.75
832
2397.5
436.85
28-12-2012
290
846
2397.35
433.6
31-12-2012
289.45
849.8
2396.7
431.75

  • Copy the above text and paste it in the rectangular space provided.
  • Check that if your data has been properly interpreted. You can change the data type from the     drop-down list provided.


  • Provide the title for the data  and other information in the following screen. Please note that only title is mandatory and all other fields are optional. Click on create.
  • The final data set will look as follows:
  • Click on Visualize to select the type of visualization you want for your data. For example, I have chosen "Matrix type" for which the visualization is shown as below:



Wide range of customization options are available for us to choose from and we can choose different options as per our requirement. Several other visualization types like bubble charts, hisograms, network diagrams, pie charts, line graphs etc. are also available

Pros:
  • Need not be downloaded and installed
  • Just needs a java enabled browser
  • Its a freeware
  • Wide variety of visualizations 
Cons:
  • The site might crash your browser sometimes
  • Your data can be viewed by others hence less privacy



Friday, 15 March 2013

Session #8 -12 Mar Assignment Submission

Problem: 

Perform Panel Data Analysis of "Produc" data

Solution:


There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model 

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)



Pooled Affect Model 

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)





Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))
summary(fixed)




Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))
> summary(random)

Testing of Model

This can be done through Hypothesis testing between the models as follows:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)


Result:
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects 
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

  Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

 Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion: 

So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation. 

Wednesday, 13 February 2013

Assignment 6

This class on 12th of Feb was basically conceptual with understanding of concepts like taking a time series data and 
-Finding its returns;
-Conducting a ACF plot to check the stationarity of the Data;
-Analysing the data through Augmented Dickey Fuller -test.
-Calculating the historical Volatility and standard deviation of a data set
-standardizing a given data set.

Assignment:
Create log of returns data  and calculate its historical volatility
Formulae:
1) logSt-logSt-1/logSt-1
OR
2) log(St-St-1/St-1)
Create ACF Plot for log returns and do the ADF test and analyse on it
Data is as follows:
NSE Index –Jan 2012 –Jan 2013
NIFTY data –Closing prices

Commands:-

> niftychart<-read.csv(file.choose(),header=T)
> closingval<-niftychart$Close

> closingval.ts<-ts(closingval,frequency=252)
> plot(log( closingval.ts))
> minusone.ts<-lag(closingval.ts,K=-1)
> plot(log( minusone.ts))
> z<-log(closingval.ts)-log(minusone.ts


> returns<-z/log(minusone.ts)
> plot(returns,main="Plot of Log Returns;CNX NSE Nifty Jan-2012 to Jan-2013" )
 > acf(returns,main=" The Auto Correlation Plot;   Dotted line shows 95% confidence interval ")

The ACF plot shows that all the correlations lie within our expectations of a 95% confidence interval so there is a fairly good chance of considering the Data to be "STATIONARY"
> adf.test(returns)
Now with the ADF test and its P-value we can confirm that the Data is "Stationary"

# Now calculating the Historical volatility of the Data

> T<-252^0.5
> histvolatality<-sd(returns)/T