Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Basic Terms in Statistics (contd 2.) – Glossary

What is Statistics : science of drawing conclusions from data

Types of Statistics

Descriptive Vs Inferential

Screenshot_122513_070608_AM

Descriptive Statistics : Summarizing data with numbers

Screenshot_122513_070212_AM

Mean : Is simply the arithmetic average of a group of numbers.

It is not a robust statistic. Sensitive to extreme values. Instead trimmed mean can be used

Trimmed Mean :   a% trimmed mean is the mean of the remaining data values after k largest and smallest values have been removed where k=an/100

Median :   “middle value” in a set. It is a robust statistic. Not sensitive to extreme values

Mode : is the most common or “most frequent.

Quartile

  • Lower – Lowest 25% of the data being found below the first quartile value, also called the lower quartile (Q1)     
  • Upper –    The lowest 75% of the data set should be found below the third quartile, also called the upper quartile (Q3)     
  • IQR – Inter Quartile Range –  Difference between 1st and 3rd quartile     
  • Screenshot_112013_073539_PM

Variance and  Standard Deviation: Variance (spread)is the measure of spread of data. Use range or iqr to define spread, but most importantly Variance and  SD  is used

Screenshot_112013_073842_PM

 

How the Mean and the SD work together

From at the average, and we walk a few SDs on either side, and in that interval, we’ve picked up the bulk of the distribution

Normal Curve
It is an approximation to many distributions of data, and many distributions of probabilities.

Coefficient of Variation
The coefficient of variation measures the spread of a set of data as a proportion of its mean. It is often expressed as a percentage.
It is the ratio of the sample standard deviation to the sample mean

5-Number Summary
It consists of 5 values: the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median.

Z-score
Subtract the sample mean from the value and divide what you get by the sample standard deviation.

It has the following properties:

  • z is positive when the value is greater than the mean
  • z is negative when the value is less than the mean
  • z is the number of standard deviations between the value and the mean.
  • z is zero when the value equals the mean
  • z has no theoretic upper or lower bound apart from that caused naturally by the range that values can take.

 

Inferential Statistics : Decisions based on data. Eg If I see a difference between 2 groups / variables, could this be due to chance ? Do we generalize the findings

Screenshot_112013_085157_PM

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

References for these notes :

The study material for the MOOC “Making sense of data” at Coursera.org

http://cmapskm.ihmc.us/rid=1052458963987_1845442706_8642/Descriptive%20statistics.cmap

http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html

http://www.gla.ac.uk/sums/users/lhornibrook/Sensor_Comparisons/zscores2.html

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: