Friday, March 4, 2016

Cheat sheet : Exploratory data analysis


Here is short version of exploratory data analysis

1. Variable Identification (categorical, continuous, etc)
2. Univariate Analysis
    a. categorical variable : Frequency of occurance (count). Bar chart for visualization
    b. continuous variable: Mean, media, mode, min and max. Histogram for visualization

Ref: https://www.youtube.com/watch?v=wFabyCP54YA

3. Bi-variate Analysis
    a. Continuous & Continuous: Scatter plot to find out Correlation
Correlation varies between -1 and +1.

-1: perfect negative linear correlation
+1:perfect positive linear correlation and
0: No correlation

    b. Categorical & Categorical:
a. Two-way table: Have count and count% as metric
b. Stacked Column Chart:
c. Chi-Square Test: Need to read more on this but
Probability of 0: It indicates that both categorical variable are dependent
Probability of 1: It shows that both variables are independent.
c. Categorical & Continuous:
a. Z-Test/ T-Test:
b. ANOVA:  It assesses whether the average of more than two groups is statistically different.

Ref: https://www.youtube.com/watch?v=IA0unflfvQE
https://www.youtube.com/watch?v=zdU8C8QEHH0

..To be continued...

Ref: http://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/

No comments:

Post a Comment