Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Tag Archives: comparing proportions

Hypothesis test for Proportions – 1 sample, 2 sample

Example 1 (Test for proportions)

Statement
Population – XYZ Intl claims that 45% of people in country ABC support banning cigarettes
Sample (real world) – 200 people are asked the above question if they want to support banning cigarettes.
49% say yes. Is there enough evidence to support claim ?

  • Parameter statement – To test the claim
  • Hypothesis
    • Null Hypothesis – H0 – Proportion of people supporting p=0.45
    • Alternative Hypothesis – H1 – Proportion of people supporting p != 0.45
  • Assumption – Data follows normal distribution (parametric)
  • Choose Test

Two tailed, Z-test, Significance level =.05

  • Calculations
pbar=0.49
p0=0.45
n=200
z = (pbar−p0)/sqrt(p0∗(1−p0)/n)
z
[1] 1.13707

The critical values at .05 significance level are

alpha = .05
 z.half.alpha = qnorm(1−alpha/2)
 c(−z.half.alpha, z.half.alpha)
[1] -1.959964  1.959964

Screenshot_122713_114210_AM

The test statistic 1.13707 lies between the critical values -1.9600 and 1.9600.


pvalue2sided=2*pnorm(-abs(z))
pvalue2sided

[1] 0.2555088
  • Decision

Hence, at .05 significance level, we have evidence not to reject the null hypothesis

Example 2 (Test for proportions)

  • Statement

Population – XYZ Intl claims that less than 44% of people in country ABC support banning cigarettes
Sample (real world) – 1046 people are asked the above question if they want to support banning cigarettes.
42% say yes. Is there enough evidence to support claim ?

  • Parameter statement – To test the claim
  • Hypothesis
    • Null Hypothesis – H0 – Proportion of people supporting p=0.44
    • Alternative Hypothesis – H1 – Proportion of people supporting p < 0.44
  • Assumption – Data follows normal distribution (parametric)
  • Choose Test

One tailed, Z-test, Significance level =.05

pbar=0.42
p0=0.44
n=1046
z = (pbar−p0)/sqrt(p0∗(1−p0)/n)
z
[1] -1.303093
alpha = .05 
z.half.alpha = qnorm(1−alpha/2)
c(−z.half.alpha, z.half.alpha) 
[1] -1.959964 1.959964
pvalue1sided=1*pnorm(-abs(z))
pvalue1sided
[1] 0.09627147
  • Decision

Hence, at .05 significance level, we have evidence to reject the null hypothesis

Example 3 (Testing differences between proportions aka comparing proportions)

  • Statement

200 random adult females and 250 random adult males were asked if they shop online. 30% females and 38% said yes. At alpha =0.1, test the claim that there is a difference in the proportion of female users and proportion of male users who shop online.

  • Parameter statement – To test the claim
  • Hypothesis
    • Null Hypothesis – H0 – Proportion of females != proportion of males => proportion of females – proportion of males = 0
    • Alternative Hypothesis – H1 – Proportion of females = proportion of males same as => proportion of females – proportion of males != 0
  • Assumption – Data follows normal distribution (parametric)
  • Choose Test

Two sample, Z-test, alpha =0.1

Use the online calculator at http://www.socscistatistics.com/tests/ztest/Default2.aspx to calculate Z and P Values

The Z-Score is -1.7746. The p-value is 0.07672. Hence, at .1 significance level, we have evidence to reject the null hypothesis

Example 4 (Independent samples – 2 sample)

Poll1 – June 2011, n1 = 1050, phat1 = 57%
Poll2 – Sep 2011, n2 = 1046, phat2 = 42%

The support in the polls have changed.

  • Hypothesis
    • H0=support did not change phat1-phat2 = 0
    • H1 = support changed phat1-phat2 != 0
  • Calculation
n1 = 1050
n2 = 1046
phat1=0.57
phat2=0.42
# number of successes
x1=round(n1*phat1,0)
x1
[1] 598
x2=round(n2*phat2,0)
x2
[1] 439
prop.test(c(x1,x2), c(n1,n2), alternative='two.sided', correct=F)
2-sample test for equality of proportions without continuity
 correction
data: c(x1, x2) out of c(n1, n2)
X-squared = 47.058, df = 1, p-value = 6.892e-12
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1075049 0.1921546
sample estimates:
 prop 1 prop 2 
0.5695238 0.4196941
  • Decision
True support anywhere between 10.8 and 19.2 %. p-value is very small which is strong evidence to reject the null hypothesis.

PS – prop.test calculates X-Square (in purple color above), which is not the test statistic we want. To calculate z-score

phat_pooled = (n1*phat1 + n2*phat2)/(n1+n2)

z=(phat1-phat2)/sqrt(phat_pooled * (1-phat_pooled)*(1/n1 + 1/n2))

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

Referencesfor these notes :

The study material for the MOOC “Making sense of data” at Coursera.org

Hypothesis Test for Proportions – YouTube

http://www.youtube.com/watch?v=h2zyqRyoCfs

Advertisements