Statistics review 9: One-way analysis of variance ================================================= R code accompanying `paper `__ Key learning points ------------------- - How to test differences between more than two groups or treatments - Multiple comparison procedures - Orthogonal contrasts - Pairwise comparisons .. code:: r suppressPackageStartupMessages(library(tidyverse)) options(repr.plot.width=4, repr.plot.height=3) Multiple comparisons -------------------- For 4 groups, we can perform 6 pairwise comparisons. This inflates the error control from the expected 5% to 26%. .. code:: r s = 0 for (n in 1:100) { df <- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10), x4=rnorm(10)) for (i in 1:4) { for (j in 1:4) { if (i < j) { if (t.test(df[,i], df[,j])$p.val < 0.05) { s = s+1 } } } } } s/100.0 .. raw:: html 0.25 One-way analysis of variance ---------------------------- .. code:: r t1 <- c(10, 12, 14) t2 <- c(19, 20, 21) t3 <- c(14, 16, 18) df <- data.frame(t1=t1, t2=t2, t3=t3) df .. raw:: html
t1t2t3
101914
122016
142118
Manual calculation ~~~~~~~~~~~~~~~~~~ .. code:: r grand.mean <- mean(unlist(df)) grand.mean .. raw:: html 16 .. code:: r treatment.mean <- df %>% summarize_each('mean') treatment.mean .. raw:: html
t1t2t3
122016
.. code:: r df1 <- df %>% gather(treatment) %>% mutate(explained.variance=as.numeric(rep(treatment.mean, each=3)) - grand.mean) %>% mutate(residual=value - as.numeric(rep(treatment.mean, each=3))) %>% mutate(deviation=value - grand.mean) df1 .. raw:: html
treatmentvalueexplained.varianceresidualdeviation
t110-4-2-6
t112-4 0-4
t114-4 2-2
t219 4-1 3
t220 4 0 4
t221 4 1 5
t314 0-2-2
t316 0 0 0
t318 0 2 2
.. code:: r between.ss <- sum(df1$explained.variance^2) between.df <- length(treatment.mean) - 1 between.ms <- between.ss/between.df within.ss <- sum(df1$residual^2) within.df <- dim(df1)[1] - 1 - between.df within.ms <- within.ss/within.df .. code:: r between.ms .. raw:: html 48 .. code:: r within.ms .. raw:: html 3 .. code:: r F <- between.ms/within.ms F .. raw:: html 16 .. code:: r round(1 - pf(F, df1=between.df, df2=within.df), 4) .. raw:: html 0.0039 Using built-in test ~~~~~~~~~~~~~~~~~~~ .. code:: r df2 <- df %>% gather(treatment, value) df2$treatment <- as.factor(df2$treatment) .. code:: r df2 .. raw:: html
treatmentvalue
t110
t112
t114
t219
t220
t221
t314
t316
t318
.. code:: r ggplot(df2, aes(x=treatment, y=value)) + geom_jitter(width=0.1) .. image:: SR09_One-way_analysis_of_variance_files/SR09_One-way_analysis_of_variance_19_1.png .. code:: r fit <- aov(value ~ treatment, data=df2) summary(fit) .. parsed-literal:: Df Sum Sq Mean Sq F value Pr(>F) treatment 2 96 48 16 0.00394 ** Residuals 6 18 3 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Multiple comparison procedures ------------------------------ .. code:: r pairwise.t.test(df2$value, df2$treatment, p.adjust.method = "none") .. parsed-literal:: Pairwise comparisons using t tests with pooled SD data: df2$value and df2$treatment t1 t2 t2 0.0013 - t3 0.0300 0.0300 P value adjustment method: none .. code:: r pairwise.t.test(df2$value, df2$treatment, p.adjust.method = "bonferroni") .. parsed-literal:: Pairwise comparisons using t tests with pooled SD data: df2$value and df2$treatment t1 t2 t2 0.0039 - t3 0.0901 0.0901 P value adjustment method: bonferroni Duncan multiple range test ~~~~~~~~~~~~~~~~~~~~~~~~~~ We will not use this test as it does not properly control for Type 1 error. We suggest you use TukeyHSD instead. Tukey's method ~~~~~~~~~~~~~~ .. code:: r TukeyHSD(fit) .. parsed-literal:: Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = value ~ treatment, data = df2) $treatment diff lwr upr p adj t2-t1 8 3.6608047 12.3391953 0.0031658 t3-t1 4 -0.3391953 8.3391953 0.0673680 t3-t2 -4 -8.3391953 0.3391953 0.0673680 Contrasts --------- In the regular one-way ANOVA, comparisons are made between treatments and the global mean. Other comparisons are possible, for example, between the mean of treatment 1 and the mean of treatments 2, 3 and 4. These alternative comparisons are known as contrasts. We show how to get different contrasts in R here. .. code:: r levels(df2$treatment) .. raw:: html
  1. 't1'
  2. 't2'
  3. 't3'
Standard ANOVA ~~~~~~~~~~~~~~ .. code:: r Df Sum Sq Mean Sq F value Pr(>F) treatment 2 96 48 16 0.00394 ** Residuals 6 18 3 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Zero-sum contrasts ~~~~~~~~~~~~~~~~~~ .. code:: r mat <- contr.sum(3) .. code:: r mat .. raw:: html
1 1 0
2 0 1
3-1-1
.. code:: r contrasts(df2$treatment) <- mat .. code:: r m <- aov(value ~ treatment, df2) .. code:: r summary(m) .. parsed-literal:: Df Sum Sq Mean Sq F value Pr(>F) treatment 2 96 48 16 0.00394 ** Residuals 6 18 3 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 .. code:: r summary.aov(m, split=list(treatment=list("T2 vs T1 and T3"=1, "T1 vs T2 and T3" = 2))) .. parsed-literal:: Df Sum Sq Mean Sq F value Pr(>F) treatment 2 96 48 16 0.00394 ** treatment: T2 vs T1 and T3 1 24 24 8 0.03002 * treatment: T1 vs T2 and T3 1 72 72 24 0.00271 ** Residuals 6 18 3 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 .. code:: r t.test(df$t2, df$t3) .. parsed-literal:: Welch Two Sample t-test data: df$t2 and df$t3 t = 3.0984, df = 2.9412, p-value = 0.05479 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1553995 8.1553995 sample estimates: mean of x mean of y 20 16 Polynomial contrasts ~~~~~~~~~~~~~~~~~~~~ If the categorical variable is ordinal, we can use polynomial contrasts to look for a trend. .. code:: r a1 <- c(24.4, 23.0, 25.4, 24.8, 23.6, 25.0, 23.4, 22.5, 21.7, 26.2) a2 <- c(25.8, 25.6, 28.2, 22.6, 22.0, 23.8, 27.3, 22.8, 25.4, 26.1) a3 <- c(26.1, 27.7, 21.8, 23.9, 27.7, 22.6, 26.0, 27.4, 26.6, 28.2) pressure <- data.frame(a1=a1, a2=a2, a3=a3) pressure .. raw:: html
a1a2a3
24.425.826.1
23.025.627.7
25.428.221.8
24.822.623.9
23.622.027.7
25.023.822.6
23.427.326.0
22.522.827.4
21.725.426.6
26.226.128.2
.. code:: r df3 <- pressure %>% gather(age.group, value) df3$age.group <- as.factor(df3$age.group) ggplot(df3, aes(x=age.group, y=value)) + geom_boxplot() .. image:: SR09_One-way_analysis_of_variance_files/SR09_One-way_analysis_of_variance_42_1.png Regular ANOVA ~~~~~~~~~~~~~ .. code:: r model <- aov(value ~ age.group, data=df3) summary(model) .. parsed-literal:: Df Sum Sq Mean Sq F value Pr(>F) age.group 2 16.22 8.112 2.132 0.138 Residuals 27 102.74 3.805 Using polynomial contrasts to test for linear and quadratic trends ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: r c1 <- c(-1, 0, 1) c2 <- c(0.5, -1, 0.5) mat <- cbind(c1,c2) contrasts(df3$age.group) <- mat model1 <- aov(value ~ age.group, data = df3) summary.aov(model1, split=list(age.group=list("Linear"=1, "Quadratic" = 2))) .. parsed-literal:: Df Sum Sq Mean Sq F value Pr(>F) age.group 2 16.22 8.112 2.132 0.1382 age.group: Linear 1 16.20 16.200 4.257 0.0488 * age.group: Quadratic 1 0.02 0.024 0.006 0.9373 Residuals 27 102.74 3.805 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 We can also generate the contrast matrix automatically ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: r mat <- contr.poly(3) contrasts(df3$age.group) <- mat model1 <- aov(value ~ age.group, data = df3) summary.aov(model1, split=list(age.group=list("Linear"=1, "Quadratic" = 2))) .. parsed-literal:: Df Sum Sq Mean Sq F value Pr(>F) age.group 2 16.22 8.112 2.132 0.1382 age.group: Linear 1 16.20 16.200 4.257 0.0488 * age.group: Quadratic 1 0.02 0.024 0.006 0.9373 Residuals 27 102.74 3.805 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Conclusion ~~~~~~~~~~ There is evidence for a linear trend in values across age groups. Exercise -------- .. code:: r ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot() .. image:: SR09_One-way_analysis_of_variance_files/SR09_One-way_analysis_of_variance_51_1.png **1**. Perform a one-way analysis of variance for the effect of diet on chick weights in the ``chickwts`` data set. **2**. Find which pairwise comparisons are statistically significant using Tukey's correction.