Statistics review 4: Sample Size calculations¶
R code accompanying paper
Key learning points¶
- Studies must be adequately powered to achieve their aims, and appropriate sample size calculations should be carried out at the design stage of any study.
- Estimation of the expected size of effect can be difficult and should, wherever possible, be based on existing evidence and clinical expertise. It is important that any estimates be large enough to be clinically important while also remaining plausible.
- Many apparently null studies may be under-powered rather than genuinely demonstrating no difference between groups; absence of evidence is not evidence of absence.
suppressPackageStartupMessages(library(tidyverse))
options(repr.plot.width=4, repr.plot.height=3)
Power¶
Power is is the probability of correctly identifying a difference between the two groups in the study sample when one genuinely exists in the populations from which the samples were drawn.
Problem with small sample sizes¶
Suppose that a drug can lower blood pressure by 10 mm Hg on average, and that the standard deviation of the drug effect is also 10 mm Hg. We will run some simulations with two populations each of size \(n\) to see what sample size is needed to get a power of 0.9.
n.expts <- 1000
alpha <- 0.05
ns <- seq(3, 25, by=1)
data <- matrix(NA, nrow=length(ns), ncol=2, dimnames=list(NULL, c("n", "power")))
for (n in seq(3, 25, by=1)) {
success <- 0
for (expt in 1:n.expts) {
effect <- rnorm(n, mean=10, sd=10)
p <- t.test(effect)$p.value
if (p < alpha) {
success <- success + 1
}
}
data[n-2, 1] <- n
data[n-2, 2] <- success/n.expts
}
as.data.frame(data)
| n | power |
|---|---|
| 3 | 0.172 |
| 4 | 0.309 |
| 5 | 0.397 |
| 6 | 0.510 |
| 7 | 0.585 |
| 8 | 0.689 |
| 9 | 0.769 |
| 10 | 0.788 |
| 11 | 0.842 |
| 12 | 0.883 |
| 13 | 0.911 |
| 14 | 0.940 |
| 15 | 0.948 |
| 16 | 0.968 |
| 17 | 0.972 |
| 18 | 0.974 |
| 19 | 0.978 |
| 20 | 0.989 |
| 21 | 0.993 |
| 22 | 0.996 |
| 23 | 0.996 |
| 24 | 0.995 |
| 25 | 0.998 |
Interpretation of simulation results¶
It appears as if a sample size of 13 will give a power of 0.9 with the given effect size.
Factors that affect sample size calculations¶
The following affect sample size calculations
- p value threshold (a 0.01 threshold requires a larger sample size than a 0.05 threshold)
- power (0.9 power requires a larger sample size than 0.8 power)
- effect size (small effect sizes require a larger sample size than large effect sizes)
Using a library to calculate power¶
One of the most convenient libraries in R for simple sample size
calculations is pwr. We will use it to illustrate many of the
calculations done in the paper.
library(pwr)
Check simulation results¶
d <- 10/10 # effect size is change over standard deviaiton
pwr.t.test(d = d, sig.level = 0.05, power = 0.9, type = "one.sample")
One-sample t test power calculation
n = 12.58546
d = 1
sig.level = 0.05
power = 0.9
alternative = two.sided
Small p-value thresholds need larger sample sizes¶
pwr.t.test(d = 1, sig.level = 0.01, power = 0.9, type = "one.sample")
One-sample t test power calculation
n = 18.30346
d = 1
sig.level = 0.01
power = 0.9
alternative = two.sided
pwr.t.test(d = 1, sig.level = 0.1, power = 0.9, type = "one.sample")
One-sample t test power calculation
n = 10.08101
d = 1
sig.level = 0.1
power = 0.9
alternative = two.sided
Large power needs larger sample sizes¶
pwr.t.test(d = 1, sig.level = 0.05, power = 0.8, type = "one.sample")
One-sample t test power calculation
n = 9.93785
d = 1
sig.level = 0.05
power = 0.8
alternative = two.sided
pwr.t.test(d = 1, sig.level = 0.05, power = 0.9, type = "one.sample")
One-sample t test power calculation
n = 12.58546
d = 1
sig.level = 0.05
power = 0.9
alternative = two.sided
Small effect size needs larger sample size¶
pwr.t.test(d = 0.1, sig.level = 0.05, power = 0.9, type = "one.sample")
One-sample t test power calculation
n = 1052.665
d = 0.1
sig.level = 0.05
power = 0.9
alternative = two.sided
pwr.t.test(d = 1.0, sig.level = 0.05, power = 0.9, type = "one.sample")
One-sample t test power calculation
n = 12.58546
d = 1
sig.level = 0.05
power = 0.9
alternative = two.sided
Note that an effect size of 1.0 is considered to be large. The before/after changes look like this when d=1.
x <- seq(-3, 4, length.out = 100)
y1 <- dnorm(x, 0, 1)
y2 <- dnorm(x, 1, 1)
df <- data.frame(x=x, y1=y1, y2=y2)
ggplot(df, aes(x=x, y=y1)) +
geom_line() +
geom_line(aes(y = y2), color="red")
Sample size calculation for a difference in means (equal sized groups)¶
pwr.t.test(d = 1.0, sig.level = 0.05, power = 0.9, type = "two.sample")
Two-sample t test power calculation
n = 22.02109
d = 1
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in each group
Sample size calculation for a difference in means (different sized groups)¶
Suppose one of the treatment groups is fixed in size.
pwr.t2n.test(d = 1.0, n1 = 20, sig.level = 0.05, power = 0.9)
t test power calculation
n1 = 20
n2 = 24.47031
d = 1
sig.level = 0.05
power = 0.9
alternative = two.sided
Sample size calculation for a difference in proportions (equal sized groups)¶
p1 <- 0.4
p2 <- 0.6
p <- mean(p1, p2)
h <- (p1 - p2)/sqrt(p*(1-p))
pwr.2p.test(h = h, sig.level = 0.05, power = 0.9)
Difference of proportion power calculation for binomial distribution (arcsine transformation)
h = 0.4082483
n = 126.089
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: same sample sizes
Sample size calculation for a difference in proportions (different sized groups)¶
p1 <- 0.4
p2 <- 0.6
p <- mean(p1, p2)
h <- (p1 - p2)/sqrt(p*(1-p))
pwr.2p2n.test(h = h, n1 = 100, sig.level = 0.05, power = 0.9)
difference of proportion power calculation for binomial distribution (arcsine transformation)
h = 0.4082483
n1 = 100
n2 = 170.5958
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: different sample sizes
Loss, dropout etc¶
Obvious adjustments have to be made if loss or dropout is expected. For example, if 10% of the patients in the study are expected to be lost to follow up, we need \(n/0.9\) patients to be enrolled.
Exercises¶
Suppose we are interested in whether a particular diet results in change of weight. We enroll subjects into two group - group A follows the special diet, and group B follows their regular diet (control). Suppose also that the standard deviation in both groups is the same at 25 lbs.
1. If we want to detect an effect of the diet of 10 lbs or more with 0.9 power and 0.05 significance level, how many subjects do we need for each group (assume equal size groups)?
2. If 20% of the patients are expected to drop-out before the study is completed, how many subjects would need to be enrolled for the same study design as in Ex. 1?
3 Make a graph where the x-axis shows the sample size from 10 to 100, the y-axis shows the power from 0 to 1, and plot separate curves for effect sizes of 10, 25 and 40 lbs.