Statistics review 3: Hypothesis testing and P values ==================================================== R code accompanying `paper `__ Key learning points ------------------- - A P value is the probability that an observed effect is simply due to chance; it therefore provides a measure of the strength of an association. - P values are affected both by the magnitude of the effect and by the size of the study from which they are derived, and should therefore be interpreted with caution. - Subdividing P values into *significant* and *non-significant* is poor statistical practice and should be avoided. .. code:: r suppressPackageStartupMessages(library(tidyverse)) .. code:: r options(repr.plot.width=4, repr.plot.height=3) .. code:: r plot_area <- function(lower, upper) { x <- seq(-3,3,length.out = 100) y <- dnorm(x, 0, 1) df <- data.frame(x=x, y=y) area <- round(100 * (1-pnorm(lower)), 2) ggplot(df, aes(x=x, y=y)) + geom_line() + stat_function(fun = dnorm, xlim = c(lower, upper), geom = "area", fill='salmon') + scale_x_continuous(breaks=c(lower, upper)) + annotate("text", label = paste(area, "%", " (p=", area/100, ")", sep=""), size=3, x = 3, y = 0.3) + xlab("") + ylab("") + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) } .. code:: r plot_area_two_sided <- function(lower, upper) { x <- seq(-3,3,length.out = 100) y <- dnorm(x, 0, 1) df <- data.frame(x=x, y=y) area <- 2*round(100 * (1-pnorm(lower)), 2) ggplot(df, aes(x=x, y=y)) + geom_line() + stat_function(fun = dnorm, xlim = c(lower, upper), geom = "area", fill='salmon') + stat_function(fun = dnorm, xlim = c(-upper, -lower), geom = "area", fill='salmon') + scale_x_continuous(breaks=c(lower, upper)) + annotate("text", label = paste(area, "%", " (p=", area/100, ")", sep=""), size=3, x = 3, y = 0.3) + xlab("") + ylab("") + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) } What is a p-value? ------------------ A p-value is the probability of getting a value as extreme or more extreme than some observed value would occur by chance if the null hypothesis is true. For example, if the null hypothesis is that the difference in means between two treatment groups is normally distributed with mean 0 and standard deviation 1, then the probability of observing a difference greater than 2 is the area shown in pink. .. code:: r plot_area(2, 5) .. image:: SR03_Hypothesis_testing_and_P_values_files/SR03_Hypothesis_testing_and_P_values_6_1.png One and two sided tests ~~~~~~~~~~~~~~~~~~~~~~~ Often, we are agnostic as to which direction the differnce is, and so calcualte the p-value as the area on both extremes. Note that the p-value is larger for a two-sided test. .. code:: r plot_area_two_sided(2, 5) .. image:: SR03_Hypothesis_testing_and_P_values_files/SR03_Hypothesis_testing_and_P_values_9_1.png What we need to calculate a p-value ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Null and alternative hypotheses - Distribution of values if the null hypothesis is true For some common comparison tests, the null distribution is known and the-value can be easily calculated. In other cases, we do not know what the null distribution is in *closed form* and must use computationally intensive methods to approximate it. Effect of sample size --------------------- Suppose there is some small difference in a normally distributed variable between two groups. We use simulation to see how the p value varies with sample size. N=10 ~~~~ .. code:: r delta <- 0.2 n <- 10 x1 <- rnorm(n, 0, 1) x2 <- rnorm(n, 0+delta, 1) t.test(x1, x2) .. parsed-literal:: Welch Two Sample t-test data: x1 and x2 t = 0.17497, df = 17.374, p-value = 0.8631 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.8340201 0.9851291 sample estimates: mean of x mean of y 0.16433555 0.08878103 N=100 ~~~~~ .. code:: r delta <- 0.2 n <- 100 x1 <- rnorm(n, 0, 1) x2 <- rnorm(n, 0+delta, 1) t.test(x1, x2) .. parsed-literal:: Welch Two Sample t-test data: x1 and x2 t = -1.7555, df = 195.48, p-value = 0.08075 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.52871630 0.03073659 sample estimates: mean of x mean of y -0.03132281 0.21766704 N=500 ~~~~~ .. code:: r delta <- 0.2 n <- 500 x1 <- rnorm(n, 0, 1) x2 <- rnorm(n, 0+delta, 1) t.test(x1, x2) .. parsed-literal:: Welch Two Sample t-test data: x1 and x2 t = -2.6812, df = 996.95, p-value = 0.007456 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.28920279 -0.04477245 sample estimates: mean of x mean of y 0.01777934 0.18476696 Confidence intervals ~~~~~~~~~~~~~~~~~~~~ .. code:: r label <- paste0("N=", c(10, 100, 500)) mean <- c(0.3454198-0.4756790 , 0.06476682-0.28044059 , 0.01410214-0.14486843 ) lower <- c( -1.1144221, -0.49197251, -0.25071807) upper <- c(0.8539038, 0.06062498, -0.01081449) df <- data.frame(label, mean, lower, upper) # reverses the factor level ordering for labels after coord_flip() df$label <- factor(df$label, levels=rev(df$label)) library(ggplot2) fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) + geom_pointrange() + geom_hline(yintercept=1, lty=2) + # add a dotted line at x=1 after flip coord_flip() + # flip coordinates (puts labels on y axis) xlab("Label") + ylab("Mean (95% CI)") + theme_bw() # use a white background print(fp) .. image:: SR03_Hypothesis_testing_and_P_values_files/SR03_Hypothesis_testing_and_P_values_19_0.png Exercises --------- Suppose the null distribution is a normal distribution with mean 1 and standard deviation 2. You observe a value of 2. **1**. Calculate the one-sided p-value **2**. Calculate the two-sided p-value **3**. Calculate the 95% confidence interval for the observed two-sided p-value **4**. Make a plot just like the ones shown above for the two-sided test