InDiAn InStItUte oF TeChNoLoGy - KhArAgPuR FiLeS!!!: 2013

Saturday, 23 March 2013

Session 9, 19th March, 2013

Many Eyes is a simple and easy to use service on the web that let us visualize and explore data. This service is offered by International Business Machine.

On Many Eyes we can:

View and discuss visualizations
View and discuss data sets
Create visualizations from existing data sets

Any kind of data set that we upload is visible for the community to watch for.

Steps to be followed:

We need to go to the site : http://www-958.ibm.com/software/analytics/manyeyes/register.

Register their with a valid email id. Once we are registered, we need to login with our credentials.
Under the participate tab, we need to choose create visualization.

After that we have to start with a data set. We have two choices: To use existing data set or to upload our own data set.

If our data is a list of values, first format we need to add a table with informative column headers. If our columns have different units of measure, we must be sure to include those units in the headers tabs. Then we need to use an spreadsheet program such as the Microsoft ExcelTM or a text file where columns are separated with tabs.

I have chosen the following data set for the purpose of explanation:

Date	ITC(Rs.)	RELIANCE(Rs.)	SBIN(Rs.)	TATASTEEL(Rs.)
3/12/2012	298.3	807	2210	393.45
4/12/2012	296.2	825.7	2249.2	392.35
5/12/2012	298.95	837.1	2283	399.9
6/12/2012	302.2	845.85	2317.95	403.9
7/12/2012	303	849.7	2339.6	403.95
10/12/2012	302.65	837.95	2326.8	400.9
11/12/2012	305.7	838.35	2338.55	402.9
12/12/2012	306.5	834.8	2317.95	400.3
13-12-2012	305.4	841	2325	397.65
14-12-2012	296.9	846.8	2327	400.05
17-12-2012	295	844.4	2349	402.4
18-12-2012	296	839.45	2382	416.1
19-12-2012	295.4	843.05	2408.15	424.9
20-12-2012	290.1	841.95	2398.8	433.6
21-12-2012	289.45	834.5	2376.65	436.5
24-12-2012	290.75	827.35	2359	435.9
26-12-2012	291	835.9	2380	433.8
27-12-2012	292.75	832	2397.5	436.85
28-12-2012	290	846	2397.35	433.6
31-12-2012	289.45	849.8	2396.7	431.75

Copy the above text and paste it in the rectangular space provided.

Check that if your data has been properly interpreted. You can change the data type from the drop-down list provided.

Provide the title for the data and other information in the following screen. Please note that only title is mandatory and all other fields are optional. Click on create.

The final data set will look as follows:

Click on Visualize to select the type of visualization you want for your data. For example, I have chosen "Matrix type" for which the visualization is shown as below:

Wide range of customization options are available for us to choose from and we can choose different options as per our requirement. Several other visualization types like bubble charts, hisograms, network diagrams, pie charts, line graphs etc. are also available

Pros:

Need not be downloaded and installed
Just needs a java enabled browser
Its a freeware
Wide variety of visualizations

Cons:

The site might crash your browser sometimes
Your data can be viewed by others hence less privacy

Friday, 15 March 2013

Session #8 -12 Mar Assignment Submission

Problem:

Perform Panel Data Analysis of "Produc" data

Solution:

There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)

Pooled Affect Model

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)

Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))

summary(fixed)

Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))

> summary(random)

Testing of Model

This can be done through Hypothesis testing between the models as follows:

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.

Wednesday, 13 February 2013

Assignment 6

This class on 12th of Feb was basically conceptual with understanding of concepts like taking a time series data and

-Finding its returns;
-Conducting a ACF plot to check the stationarity of the Data;
-Analysing the data through Augmented Dickey Fuller -test.
-Calculating the historical Volatility and standard deviation of a data set
-standardizing a given data set.

Assignment:
Create log of returns data and calculate its historical volatility
Formulae:
1) logSt-logSt-1/logSt-1
OR
2) log(St-St-1/St-1)
Create ACF Plot for log returns and do the ADF test and analyse on it
Data is as follows:
NSE Index –Jan 2012 –Jan 2013
NIFTY data –Closing prices

Commands:-

> niftychart<-read.csv(file.choose(),header=T)
> closingval<-niftychart$Close

> closingval.ts<-ts(closingval,frequency=252)

> plot(log( closingval.ts))

> minusone.ts<-lag(closingval.ts,K=-1)

> plot(log( minusone.ts))

> z<-log(closingval.ts)-log(minusone.ts

> returns<-z/log(minusone.ts)

> plot(returns,main="Plot of Log Returns;CNX NSE Nifty Jan-2012 to Jan-2013" )

> acf(returns,main=" The Auto Correlation Plot; Dotted line shows 95% confidence interval ")

The ACF plot shows that all the correlations lie within our expectations of a 95% confidence interval so there is a fairly good chance of considering the Data to be "STATIONARY"
> adf.test(returns)

Now with the ADF test and its P-value we can confirm that the Data is "Stationary"

# Now calculating the Historical volatility of the Data

> T<-252^0.5
> histvolatality<-sd(returns)/T