Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Tag Archives: Statistical significance

Hypothesis test for Proportions – 1 sample, 2 sample

Example 1 (Test for proportions)

Statement
Population – XYZ Intl claims that 45% of people in country ABC support banning cigarettes
Sample (real world) – 200 people are asked the above question if they want to support banning cigarettes.
49% say yes. Is there enough evidence to support claim ?

  • Parameter statement – To test the claim
  • Hypothesis
    • Null Hypothesis – H0 – Proportion of people supporting p=0.45
    • Alternative Hypothesis – H1 – Proportion of people supporting p != 0.45
  • Assumption – Data follows normal distribution (parametric)
  • Choose Test

Two tailed, Z-test, Significance level =.05

  • Calculations
pbar=0.49
p0=0.45
n=200
z = (pbar−p0)/sqrt(p0∗(1−p0)/n)
z
[1] 1.13707

The critical values at .05 significance level are

alpha = .05
 z.half.alpha = qnorm(1−alpha/2)
 c(−z.half.alpha, z.half.alpha)
[1] -1.959964  1.959964

Screenshot_122713_114210_AM

The test statistic 1.13707 lies between the critical values -1.9600 and 1.9600.


pvalue2sided=2*pnorm(-abs(z))
pvalue2sided

[1] 0.2555088
  • Decision

Hence, at .05 significance level, we have evidence not to reject the null hypothesis

Example 2 (Test for proportions)

  • Statement

Population – XYZ Intl claims that less than 44% of people in country ABC support banning cigarettes
Sample (real world) – 1046 people are asked the above question if they want to support banning cigarettes.
42% say yes. Is there enough evidence to support claim ?

  • Parameter statement – To test the claim
  • Hypothesis
    • Null Hypothesis – H0 – Proportion of people supporting p=0.44
    • Alternative Hypothesis – H1 – Proportion of people supporting p < 0.44
  • Assumption – Data follows normal distribution (parametric)
  • Choose Test

One tailed, Z-test, Significance level =.05

pbar=0.42
p0=0.44
n=1046
z = (pbar−p0)/sqrt(p0∗(1−p0)/n)
z
[1] -1.303093
alpha = .05 
z.half.alpha = qnorm(1−alpha/2)
c(−z.half.alpha, z.half.alpha) 
[1] -1.959964 1.959964
pvalue1sided=1*pnorm(-abs(z))
pvalue1sided
[1] 0.09627147
  • Decision

Hence, at .05 significance level, we have evidence to reject the null hypothesis

Example 3 (Testing differences between proportions aka comparing proportions)

  • Statement

200 random adult females and 250 random adult males were asked if they shop online. 30% females and 38% said yes. At alpha =0.1, test the claim that there is a difference in the proportion of female users and proportion of male users who shop online.

  • Parameter statement – To test the claim
  • Hypothesis
    • Null Hypothesis – H0 – Proportion of females != proportion of males => proportion of females – proportion of males = 0
    • Alternative Hypothesis – H1 – Proportion of females = proportion of males same as => proportion of females – proportion of males != 0
  • Assumption – Data follows normal distribution (parametric)
  • Choose Test

Two sample, Z-test, alpha =0.1

Use the online calculator at http://www.socscistatistics.com/tests/ztest/Default2.aspx to calculate Z and P Values

The Z-Score is -1.7746. The p-value is 0.07672. Hence, at .1 significance level, we have evidence to reject the null hypothesis

Example 4 (Independent samples – 2 sample)

Poll1 – June 2011, n1 = 1050, phat1 = 57%
Poll2 – Sep 2011, n2 = 1046, phat2 = 42%

The support in the polls have changed.

  • Hypothesis
    • H0=support did not change phat1-phat2 = 0
    • H1 = support changed phat1-phat2 != 0
  • Calculation
n1 = 1050
n2 = 1046
phat1=0.57
phat2=0.42
# number of successes
x1=round(n1*phat1,0)
x1
[1] 598
x2=round(n2*phat2,0)
x2
[1] 439
prop.test(c(x1,x2), c(n1,n2), alternative='two.sided', correct=F)
2-sample test for equality of proportions without continuity
 correction
data: c(x1, x2) out of c(n1, n2)
X-squared = 47.058, df = 1, p-value = 6.892e-12
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1075049 0.1921546
sample estimates:
 prop 1 prop 2 
0.5695238 0.4196941
  • Decision
True support anywhere between 10.8 and 19.2 %. p-value is very small which is strong evidence to reject the null hypothesis.

PS – prop.test calculates X-Square (in purple color above), which is not the test statistic we want. To calculate z-score

phat_pooled = (n1*phat1 + n2*phat2)/(n1+n2)

z=(phat1-phat2)/sqrt(phat_pooled * (1-phat_pooled)*(1/n1 + 1/n2))

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

Referencesfor these notes :

The study material for the MOOC “Making sense of data” at Coursera.org

Hypothesis Test for Proportions – YouTube

http://www.youtube.com/watch?v=h2zyqRyoCfs

Hypothesis Testing

It is based on the idea that we can tell things about the population based on a sample taken from it.

5 Steps

  1. Hypothesis
  2. Significance
  3. Sample
  4. P-Value
  5. Decide

Inferential Statistics is based on the premise that you cannot prove something to be true, but you can disprove something by finding an exception.

You decide what you want to find evidence for (H1 – there is an effect), ie the alternative hypothesis, then set up the null hypothesis (H0 – there is no effect) and find evidence to disprove it.

This is a statistical method for testing whether the factor we are talking about has any effect on our observation

In other words, this helps us decide if

  • We should believe that the relationship we found in our sample is the same as the relationship we would find if we tested the population
  • OR We should believe that the relationship we found in our sample is a coincidence due to sampling error

Read more of this post