Statistical Inference¶

\(\alpha = \pi^2\)

In [6]:

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(pwr))

Warning message:
“package ‘dplyr’ was built under R version 3.4.1”

In [122]:

options(repr.plot.width=4, repr.plot.height=3)

Coin toss experiment¶

In [106]:

n <- 12
outcomes <- c("H", "T")
tosses <- sample(outcomes, n, replace=TRUE)

In [107]:

tosses

'H'
'T'
'H'
'T'
'H'
'H'
'T'
'T'
'H'
'T'
'H'
'T'

In [108]:

table(tosses)

tosses
H T
6 6

In [109]:

binom.test(table(tosses))

Exact binomial test

data:  table(tosses)
number of successes = 6, number of trials = 12, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2109446 0.7890554
sample estimates:
probability of success
                   0.5

Note on using `dylyr`¶

Here, using dplyr is more complex because of the need to convert to and from a data.matrix. We will stick with basic R but an example is shown below using dplyr for those who are curious.

In [110]:

tbl <- data.frame(tosses) %>% count(tosses) %>% select(n)
binom.test(as.matrix(tbl))

Exact binomial test

data:  as.matrix(tbl)
number of successes = 6, number of trials = 12, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2109446 0.7890554
sample estimates:
probability of success
                   0.5

Exercise 1

Repeat the coin toss experiment with 10 tosses and evaluate if the results are statistically significant.

In [ ]:

Exercise 2

Repeat the coin toss experiment with 12 tosses, but set the probability of HEADS to be 0.3 and evaluate if the results are statistically significant.

In [ ]:

Exercise 3

Create a data sample with 2 HEADS and 10 TAILS and evaluate if the results are statistically significant.

In [ ]:

A two-sample model¶

Simulation¶

In [113]:

mu1 <- 0
mu2 <- 2
sigma <- 5
n <- 12

In [114]:

x1 <- rnorm(n, mu1, sigma)
x2 <- rnorm(n, mu2, sigma)

In [115]:

t.test(x=x1, y=x2)

Welch Two Sample t-test

data:  x1 and x2
t = -0.38018, df = 21.96, p-value = 0.7075
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.320465  3.672143
sample estimates:
mean of x mean of y
 1.036991  1.861152

Sample size calculation¶

In [116]:

d <- abs(mu1 - mu2)/sigma

In [117]:

alpha <- 0.05
power <- 0.8

In [118]:

pwr.t.test(d=d, sig.level=alpha, power=power)

Two-sample t test power calculation

              n = 99.08032
              d = 0.4
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Exercise

Suppose that company XYZ Dairies sells milk in glass bottles
The company claims that the net content of each bottle is 1 gallon
Mr. Smith, owner of the ABC Supermarket, suspects he, and ultimately his customers, are being swindled by XYZ
Let μ denote the mean net content (in gallons) of the population of XYZ Dairies milk bottles
The company claims μ = 1
Mr. Smith hypothesizes that μ < 1
Mr. Smith has to give benefit of the doubt to company XYZ’s claim (i.e., μ = 1)
The purpose of the experiment is to ascertain if there is sufficient evidence to the contrary (i.e., show μ \(\ne\) 1)
The null hypothesis is formulated as H0 : μ = 1
The alternative is formulated as H1 : μ\(\ne\) 1
Mr. Smith has no interest in gathering evidence for showing that XYZ overfills its bottles (i.e., μ > 1)

Exercise 4

Suppose the true value of \(\mu\) is 0.9. Simulate 20 samples each from the null distribution and the true distribution. Evaluate if the samples are statistically significant using a one-sided two sample T test.

In [ ]:

Exercise 5

Suppose the true value of \(\mu\) is 0.9. How many samples would be needed in each group to have a power of 0.9 at an alpha of 0.1 with a one-sided t-test?

Distribution of p-values under null distribution¶

In [119]:

mu1 <- 0
mu2 <- 0
sigma <- 5
n <- 12
r <- 1000

In [120]:

ps <- numeric(r)
for (rep in 1:r) {
    x1 <- rnorm(n, mu1, sigma)
    x2 <- rnorm(n, mu2, sigma)
    ps[rep] <- t.test(x=x1, y=x2)$p.value
}

In [123]:

hist(ps)

../../_images/Computation_Wk3_Day2_PM_01-Stat-Inference_33_0.png

In [129]:

par(pty="s")
qqplot(ps[1:100], runif(length(ps[1:100])), ylab="Uniform", xlab="P-values")

../../_images/Computation_Wk3_Day2_PM_01-Stat-Inference_34_0.png

Exercise 6

Plot the histogram of p-values and the QQ-plot when mu2 = 2.

In [ ]:

Statistical Inference¶

Coin toss experiment¶

Note on using `dylyr`¶

A two-sample model¶

Simulation¶

Sample size calculation¶

Distribution of p-values under null distribution¶

Page contents

Previous page

Next page

This Page

Statistical Inference¶

Coin toss experiment¶

Note on using dylyr¶

A two-sample model¶

Simulation¶

Sample size calculation¶

Distribution of p-values under null distribution¶

Note on using `dylyr`¶