Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Sampling Distribution

Sampling distribution is the concept which helps connect Probability to Statistics.

Screenshot_112113_093428_PM

Theoretical world -> entire population.

Real world -> Sample

The population has some parameters which are its features (which are not known)

Science of statistics is used to derive a statistic which is an estimate the parameters

Screenshot_112113_095559_PM

(PS. Probability Distribution is the same as a Distribution of Probabilities )

Eg : Flipping of coins

Sampling variability
refers to the different values which a given function of the data takes when it is computed for two or more samples drawn from the same population It is the variability we expect to see when estimating real world parameters

Example

Screenshot_112213_053856_PM

Screenshot_112213_050655_PM

Expected value of the estimator (calculated value of that something happens in the real world) (Nearly =)  the theoretical probability that it happens

‘Nearly = ‘ because of the variance in the real world

p-hat follows a normal distribution (Central limit theorem). P is the mean in the theoretical world and p(1-p)/n is the variance

Screenshot_112113_101257_PM

As n increases the variability and SD of the estimator decreases

Screenshot_112113_102550_PM

Estimator of the mean

Similar to p-hat, expected value of X bar (from the sampling) = mew => the expected value of the distribution in the theoretical world

That means that it is an unbiased calculation – which means – the expected value (population mean) of its sampling distribution is equal to the value of the parameter being estimated – which means – that the distribution of means of our samples is also a normal distribution (with an unbiased estimator)

Taken from a normal model, If we take samples of size n then take the means of each sample , the mean of these means is the same as mean of the theoretical model and SD of the distribution of the means = theoretical SD  / sqrt(n)

Even with a skewed model, the above holds good

Sampling (with replacement and without)

Sampling can be done in a couple of different ways. One question that arises when sampling is, “After we select an object and record the measurement of attribute we’re studying, what do we do with the object?” There are two options: we can replace the object into the pool of objects that we are sampling from, or we can choose to not replace the object.

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

Referencesfor these notes :

The study material for the MOOC “Making sense of data” at Coursera.org

http://www.stats.gla.ac.uk/steps/glossary/sampling.html#samplingvar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: