Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Tag Archives: independent samples

Hypothesis Testing for Means, Matched Pairs, Independent Samples

Example 1 – Hypothesis test for small sample means.

  • Statement

The mean amount of waste recycled per day is more than 1 pound per person  (over the population)
Sample – 12 people . Found to be recycling avg 1.46 pounds with SD = 0.58. Alpha = 0.05

  • Parameter statement – test the claim
  • Hypothesis
    • H0 – Mean is LT or EQ 1 pound per day
    • H1 – Mean is GT 1 pound per day
  • Assumption – Data follows normal distribution (parametric)

Why does the claim go under H1 and not under H0 ? Thats because H0 always has an “equal to” under it

  • Choose test

Right tailed (because H0 has a LT in it), t-test (sample size is less than 30)

  • Calculation
xbar = 1.46 # sample mean 
mu0 = 1 # hypothesized value 
s = 0.58 # sample standard deviation 
n = 12 # sample size 
t = (xbar−mu0)/(s/sqrt(n)) 
t 
[1] 2.747391
pval = pt(t, df=n−1, lower.tail=FALSE) 
pval # upper tail p−value
[1] 0.009489493

  • Decision

At alpha = 0.5, since p value LT alpha, we have strong evidence to reject the null hypothesis

Example 2 – Hypothesis testing for matched pairs

  • Statement

Using a built-in data set named immer (In R), the barley yield in years 1931 and 1932 of the same field are recorded in various locations. Claim is that the yields are the same

  • Parameter statement – test the claim
  • Hypothesis
    • H0 – The yields are the same. That is mew(Y1-Y2) = 0
    • H1 – The yields are difference. That is mew(Y1-Y2) != 0
  • Choose test – t-test
  • Calculation
library(MASS)
head(immer)
t.test(immer$Y1, immer$Y2, paired=TRUE)

Paired t-test

data: immer$Y1 and immer$Y2
t = 3.324, df = 29, p-value = 0.002413
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.121954 25.704713
sample estimates:
mean of the differences
15.91333

  • Decision

Assuming alpha level of 0.05, we have enough evidence to reject the null hypothesis

Alternative 1 : the detailed way of doing the calculation

library(MASS)
head(immer)
new.immer <- transform(immer, new.col=Y2-Y1)
mean(new.immer$new.col)
[1] -15.91333
> sd(new.immer$new.col)
[1] 26.2218
xbar = 15.91 # sample mean 
mu0 = 0 # hypothesized value 
sigma = 26.2218 # standard deviation 
n = 30 # sample size 
t = (xbar−mu0)/(sigma/sqrt(n)) 
t 
[1] 3.323291

Alternative 2 :  Variant of Example 2. Suppose the above dataset is not given but the means and SDs of each of the variables is provided. Also the correlation coefficient between Y1 and Y2 is provided. How do we proceed ?

Y1 mean = 109.04, Y2 mean = 93.13

SD of Y1 = 28.67, SD of Y2 = 24.27

r = 0.52

xbar <-  109.04 - 93.13
sigma <- sqrt(sd1^2 + sd2^2 -2*0.52*sd1*sd2)
mu0 = 0 # hypothesized value
n = 30 # sample size 
t = (xbar−mu0)/(sigma/sqrt(n)) 

[1] 3.323291

Example 3 – Hypothesis testing for independent samples

Using data from Example 2. Suppose the data were from independent samples. So the only difference would be to leave out the “paired=TRUE”

library(MASS)
head(immer)
t.test(immer$Y1, immer$Y2)
Welch Two Sample t-test

data: immer$Y1 and immer$Y2
t = 2.32, df = 56.463, p-value = 0.02398
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.17493 29.65174
sample estimates:
mean of x mean of y
109.04667 93.13333 

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

Referencesfor these notes :

The study material for the MOOC “Making sense of data” at Coursera.org

http://www.youtube.com/watch?v=jfUhKHX5S0E

http://www.r-tutor.com/elementary-statistics/hypothesis-testing/upper-tail-test-population-mean-unknown-variance