{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll begin by recreating the examples from this morning's lecture:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coin Toss Experiments" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'T'
  2. \n", "\t
  3. 'T'
  4. \n", "\t
  5. 'T'
  6. \n", "\t
  7. 'T'
  8. \n", "\t
  9. 'T'
  10. \n", "\t
  11. 'T'
  12. \n", "\t
  13. 'T'
  14. \n", "\t
  15. 'H'
  16. \n", "\t
  17. 'T'
  18. \n", "\t
  19. 'T'
  20. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'H'\n", "\\item 'T'\n", "\\item 'T'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'T'\n", "2. 'T'\n", "3. 'T'\n", "4. 'T'\n", "5. 'T'\n", "6. 'T'\n", "7. 'T'\n", "8. 'H'\n", "9. 'T'\n", "10. 'T'\n", "\n", "\n" ], "text/plain": [ " [1] \"T\" \"T\" \"T\" \"T\" \"T\" \"T\" \"T\" \"H\" \"T\" \"T\"" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set.seed(231)\n", "x=sample(c(\"H\",\"T\"),10,replace=TRUE,prob=c(1/3,2/3)) # Create a sample of size 10, with probability of \"H\" = 1/3\n", " # and probability of \"T\" = 2/3. Clearly, this is a biased coin.\n", "x" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\n", "\tExact binomial test\n", "\n", "data: sum(x == \"T\") and length(x)\n", "number of successes = 9, number of trials = 10, p-value = 0.02148\n", "alternative hypothesis: true probability of success is not equal to 0.5\n", "95 percent confidence interval:\n", " 0.5549839 0.9974714\n", "sample estimates:\n", "probability of success \n", " 0.9 \n" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test for biasedness\n", "\n", "binom.test(sum(x=='T'), n=length(x), p = 0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, our test concludes the coin is biased. Now we toss two coins:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'T'
  2. \n", "\t
  3. 'T'
  4. \n", "\t
  5. 'T'
  6. \n", "\t
  7. 'T'
  8. \n", "\t
  9. 'T'
  10. \n", "\t
  11. 'T'
  12. \n", "\t
  13. 'T'
  14. \n", "\t
  15. 'H'
  16. \n", "\t
  17. 'T'
  18. \n", "\t
  19. 'T'
  20. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'H'\n", "\\item 'T'\n", "\\item 'T'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'T'\n", "2. 'T'\n", "3. 'T'\n", "4. 'T'\n", "5. 'T'\n", "6. 'T'\n", "7. 'T'\n", "8. 'H'\n", "9. 'T'\n", "10. 'T'\n", "\n", "\n" ], "text/plain": [ " [1] \"T\" \"T\" \"T\" \"T\" \"T\" \"T\" \"T\" \"H\" \"T\" \"T\"" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
    \n", "\t
  1. 'T'
  2. \n", "\t
  3. 'T'
  4. \n", "\t
  5. 'T'
  6. \n", "\t
  7. 'T'
  8. \n", "\t
  9. 'T'
  10. \n", "\t
  11. 'T'
  12. \n", "\t
  13. 'H'
  14. \n", "\t
  15. 'T'
  16. \n", "\t
  17. 'T'
  18. \n", "\t
  19. 'T'
  20. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'H'\n", "\\item 'T'\n", "\\item 'T'\n", "\\item 'T'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'T'\n", "2. 'T'\n", "3. 'T'\n", "4. 'T'\n", "5. 'T'\n", "6. 'T'\n", "7. 'H'\n", "8. 'T'\n", "9. 'T'\n", "10. 'T'\n", "\n", "\n" ], "text/plain": [ " [1] \"T\" \"T\" \"T\" \"T\" \"T\" \"T\" \"H\" \"T\" \"T\" \"T\"" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set.seed(231)\n", "x1=sample(c(\"H\",\"T\"),10,replace=TRUE,prob=c(1/3,2/3))\n", "x2=sample(c(\"H\",\"T\"),10,replace=TRUE,prob=c(1/3,2/3))\n", "x1\n", "x2" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\n", "\tExact binomial test\n", "\n", "data: sum(x1 == \"T\") and length(x)\n", "number of successes = 9, number of trials = 10, p-value = 0.02148\n", "alternative hypothesis: true probability of success is not equal to 0.5\n", "95 percent confidence interval:\n", " 0.5549839 0.9974714\n", "sample estimates:\n", "probability of success \n", " 0.9 \n" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test for bias\n", "binom.test(sum(x1=='T'), n=length(x), p = 0.5)\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\n", "\tExact binomial test\n", "\n", "data: sum(x2 == \"T\") and length(x)\n", "number of successes = 9, number of trials = 10, p-value = 0.02148\n", "alternative hypothesis: true probability of success is not equal to 0.5\n", "95 percent confidence interval:\n", " 0.5549839 0.9974714\n", "sample estimates:\n", "probability of success \n", " 0.9 \n" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "binom.test(sum(x2=='T'), n=length(x), p = 0.5)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
binom.test {stats}R Documentation
\n", "\n", "

Exact Binomial Test

\n", "\n", "

Description

\n", "\n", "

Performs an exact test of a simple null hypothesis about the\n", "probability of success in a Bernoulli experiment.\n", "

\n", "\n", "\n", "

Usage

\n", "\n", "
\n",
       "binom.test(x, n, p = 0.5,\n",
       "           alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "           conf.level = 0.95)\n",
       "
\n", "\n", "\n", "

Arguments

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
x\n", "

number of successes, or a vector of length 2 giving the\n", "numbers of successes and failures, respectively.

\n", "
n\n", "

number of trials; ignored if x has length 2.

\n", "
p\n", "

hypothesized probability of success.

\n", "
alternative\n", "

indicates the alternative hypothesis and must be\n", "one of \"two.sided\", \"greater\" or \"less\".\n", "You can specify just the initial letter.

\n", "
conf.level\n", "

confidence level for the returned confidence\n", "interval.

\n", "
\n", "\n", "\n", "

Details

\n", "\n", "

Confidence intervals are obtained by a procedure first given in\n", "Clopper and Pearson (1934). This guarantees that the confidence level\n", "is at least conf.level, but in general does not give the\n", "shortest-length confidence intervals.\n", "

\n", "\n", "\n", "

Value

\n", "\n", "

A list with class \"htest\" containing the following components:\n", "

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
statistic\n", "

the number of successes.

\n", "
parameter\n", "

the number of trials.

\n", "
p.value\n", "

the p-value of the test.

\n", "
conf.int\n", "

a confidence interval for the probability of success.

\n", "
estimate\n", "

the estimated probability of success.

\n", "
null.value\n", "

the probability of success under the null,\n", "p.

\n", "
alternative\n", "

a character string describing the alternative\n", "hypothesis.

\n", "
method\n", "

the character string \"Exact binomial test\".

\n", "
data.name\n", "

a character string giving the names of the data.

\n", "
\n", "\n", "\n", "

References

\n", "\n", "

Clopper, C. J. & Pearson, E. S. (1934).\n", "The use of confidence or fiducial limits illustrated in the case of\n", "the binomial.\n", "Biometrika, 26, 404–413.\n", "

\n", "

William J. Conover (1971),\n", "Practical nonparametric statistics.\n", "New York: John Wiley & Sons.\n", "Pages 97–104.\n", "

\n", "

Myles Hollander & Douglas A. Wolfe (1973),\n", "Nonparametric Statistical Methods.\n", "New York: John Wiley & Sons.\n", "Pages 15–22.\n", "

\n", "\n", "\n", "

See Also

\n", "\n", "

prop.test for a general (approximate) test for equal or\n", "given proportions.\n", "

\n", "\n", "\n", "

Examples

\n", "\n", "
\n",
       "## Conover (1971), p. 97f.\n",
       "## Under (the assumption of) simple Mendelian inheritance, a cross\n",
       "##  between plants of two particular genotypes produces progeny 1/4 of\n",
       "##  which are \"dwarf\" and 3/4 of which are \"giant\", respectively.\n",
       "##  In an experiment to determine if this assumption is reasonable, a\n",
       "##  cross results in progeny having 243 dwarf and 682 giant plants.\n",
       "##  If \"giant\" is taken as success, the null hypothesis is that p =\n",
       "##  3/4 and the alternative that p != 3/4.\n",
       "binom.test(c(682, 243), p = 3/4)\n",
       "binom.test(682, 682 + 243, p = 3/4)   # The same.\n",
       "## => Data are in agreement with the null hypothesis.\n",
       "
\n", "\n", "
[Package stats version 3.2.3 ]
" ], "text/latex": [ "\\inputencoding{utf8}\n", "\\HeaderA{binom.test}{Exact Binomial Test}{binom.test}\n", "\\keyword{htest}{binom.test}\n", "%\n", "\\begin{Description}\\relax\n", "Performs an exact test of a simple null hypothesis about the\n", "probability of success in a Bernoulli experiment.\n", "\\end{Description}\n", "%\n", "\\begin{Usage}\n", "\\begin{verbatim}\n", "binom.test(x, n, p = 0.5,\n", " alternative = c(\"two.sided\", \"less\", \"greater\"),\n", " conf.level = 0.95)\n", "\\end{verbatim}\n", "\\end{Usage}\n", "%\n", "\\begin{Arguments}\n", "\\begin{ldescription}\n", "\\item[\\code{x}] number of successes, or a vector of length 2 giving the\n", "numbers of successes and failures, respectively.\n", "\\item[\\code{n}] number of trials; ignored if \\code{x} has length 2.\n", "\\item[\\code{p}] hypothesized probability of success.\n", "\\item[\\code{alternative}] indicates the alternative hypothesis and must be\n", "one of \\code{\"two.sided\"}, \\code{\"greater\"} or \\code{\"less\"}.\n", "You can specify just the initial letter.\n", "\\item[\\code{conf.level}] confidence level for the returned confidence\n", "interval.\n", "\\end{ldescription}\n", "\\end{Arguments}\n", "%\n", "\\begin{Details}\\relax\n", "Confidence intervals are obtained by a procedure first given in\n", "Clopper and Pearson (1934). This guarantees that the confidence level\n", "is at least \\code{conf.level}, but in general does not give the\n", "shortest-length confidence intervals.\n", "\\end{Details}\n", "%\n", "\\begin{Value}\n", "A list with class \\code{\"htest\"} containing the following components:\n", "\\begin{ldescription}\n", "\\item[\\code{statistic}] the number of successes.\n", "\\item[\\code{parameter}] the number of trials.\n", "\\item[\\code{p.value}] the p-value of the test.\n", "\\item[\\code{conf.int}] a confidence interval for the probability of success.\n", "\\item[\\code{estimate}] the estimated probability of success.\n", "\\item[\\code{null.value}] the probability of success under the null,\n", "\\code{p}.\n", "\\item[\\code{alternative}] a character string describing the alternative\n", "hypothesis.\n", "\\item[\\code{method}] the character string \\code{\"Exact binomial test\"}.\n", "\\item[\\code{data.name}] a character string giving the names of the data.\n", "\\end{ldescription}\n", "\\end{Value}\n", "%\n", "\\begin{References}\\relax\n", "Clopper, C. J. \\& Pearson, E. S. (1934).\n", "The use of confidence or fiducial limits illustrated in the case of\n", "the binomial.\n", "\\emph{Biometrika}, \\bold{26}, 404--413.\n", "\n", "William J. Conover (1971),\n", "\\emph{Practical nonparametric statistics}.\n", "New York: John Wiley \\& Sons.\n", "Pages 97--104.\n", "\n", "Myles Hollander \\& Douglas A. Wolfe (1973),\n", "\\emph{Nonparametric Statistical Methods.}\n", "New York: John Wiley \\& Sons.\n", "Pages 15--22.\n", "\\end{References}\n", "%\n", "\\begin{SeeAlso}\\relax\n", "\\code{\\LinkA{prop.test}{prop.test}} for a general (approximate) test for equal or\n", "given proportions.\n", "\\end{SeeAlso}\n", "%\n", "\\begin{Examples}\n", "\\begin{ExampleCode}\n", "## Conover (1971), p. 97f.\n", "## Under (the assumption of) simple Mendelian inheritance, a cross\n", "## between plants of two particular genotypes produces progeny 1/4 of\n", "## which are \"dwarf\" and 3/4 of which are \"giant\", respectively.\n", "## In an experiment to determine if this assumption is reasonable, a\n", "## cross results in progeny having 243 dwarf and 682 giant plants.\n", "## If \"giant\" is taken as success, the null hypothesis is that p =\n", "## 3/4 and the alternative that p != 3/4.\n", "binom.test(c(682, 243), p = 3/4)\n", "binom.test(682, 682 + 243, p = 3/4) # The same.\n", "## => Data are in agreement with the null hypothesis.\n", "\\end{ExampleCode}\n", "\\end{Examples}" ], "text/plain": [ "binom.test package:stats R Documentation\n", "\n", "_\bE_\bx_\ba_\bc_\bt _\bB_\bi_\bn_\bo_\bm_\bi_\ba_\bl _\bT_\be_\bs_\bt\n", "\n", "_\bD_\be_\bs_\bc_\br_\bi_\bp_\bt_\bi_\bo_\bn:\n", "\n", " Performs an exact test of a simple null hypothesis about the\n", " probability of success in a Bernoulli experiment.\n", "\n", "_\bU_\bs_\ba_\bg_\be:\n", "\n", " binom.test(x, n, p = 0.5,\n", " alternative = c(\"two.sided\", \"less\", \"greater\"),\n", " conf.level = 0.95)\n", " \n", "_\bA_\br_\bg_\bu_\bm_\be_\bn_\bt_\bs:\n", "\n", " x: number of successes, or a vector of length 2 giving the\n", " numbers of successes and failures, respectively.\n", "\n", " n: number of trials; ignored if ‘x’ has length 2.\n", "\n", " p: hypothesized probability of success.\n", "\n", "alternative: indicates the alternative hypothesis and must be one of\n", " ‘\"two.sided\"’, ‘\"greater\"’ or ‘\"less\"’. You can specify just\n", " the initial letter.\n", "\n", "conf.level: confidence level for the returned confidence interval.\n", "\n", "_\bD_\be_\bt_\ba_\bi_\bl_\bs:\n", "\n", " Confidence intervals are obtained by a procedure first given in\n", " Clopper and Pearson (1934). This guarantees that the confidence\n", " level is at least ‘conf.level’, but in general does not give the\n", " shortest-length confidence intervals.\n", "\n", "_\bV_\ba_\bl_\bu_\be:\n", "\n", " A list with class ‘\"htest\"’ containing the following components:\n", "\n", "statistic: the number of successes.\n", "\n", "parameter: the number of trials.\n", "\n", " p.value: the p-value of the test.\n", "\n", "conf.int: a confidence interval for the probability of success.\n", "\n", "estimate: the estimated probability of success.\n", "\n", "null.value: the probability of success under the null, ‘p’.\n", "\n", "alternative: a character string describing the alternative hypothesis.\n", "\n", " method: the character string ‘\"Exact binomial test\"’.\n", "\n", "data.name: a character string giving the names of the data.\n", "\n", "_\bR_\be_\bf_\be_\br_\be_\bn_\bc_\be_\bs:\n", "\n", " Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or\n", " fiducial limits illustrated in the case of the binomial.\n", " _Biometrika_, *26*, 404-413.\n", "\n", " William J. Conover (1971), _Practical nonparametric statistics_.\n", " New York: John Wiley & Sons. Pages 97-104.\n", "\n", " Myles Hollander & Douglas A. Wolfe (1973), _Nonparametric\n", " Statistical Methods._ New York: John Wiley & Sons. Pages 15-22.\n", "\n", "_\bS_\be_\be _\bA_\bl_\bs_\bo:\n", "\n", " ‘prop.test’ for a general (approximate) test for equal or given\n", " proportions.\n", "\n", "_\bE_\bx_\ba_\bm_\bp_\bl_\be_\bs:\n", "\n", " ## Conover (1971), p. 97f.\n", " ## Under (the assumption of) simple Mendelian inheritance, a cross\n", " ## between plants of two particular genotypes produces progeny 1/4 of\n", " ## which are \"dwarf\" and 3/4 of which are \"giant\", respectively.\n", " ## In an experiment to determine if this assumption is reasonable, a\n", " ## cross results in progeny having 243 dwarf and 682 giant plants.\n", " ## If \"giant\" is taken as success, the null hypothesis is that p =\n", " ## 3/4 and the alternative that p != 3/4.\n", " binom.test(c(682, 243), p = 3/4)\n", " binom.test(682, 682 + 243, p = 3/4) # The same.\n", " ## => Data are in agreement with the null hypothesis.\n", " " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?binom.test\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.34375\n", "[1] 1\n", "[1] 0.109375\n", "[1] 1\n", "[1] 1\n", "[1] 0.109375\n", "[1] 0.34375\n", "[1] 0.34375\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 1\n", "[1] 0.02148438\n", "[1] 0.34375\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.34375\n", "[1] 1\n", "[1] 1\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.109375\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 0.34375\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.109375\n", "[1] 1\n", "[1] 0.109375\n", "[1] 1\n", "[1] 1\n", "[1] 1\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 1\n", "[1] 0.109375\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 0.7539063\n", "[1] 0.109375\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 0.109375\n", "[1] 0.34375\n", "[1] 0.34375\n", "[1] 0.109375\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 1\n", "[1] 1\n", "[1] 1\n", "[1] 1\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 0.02148438\n", "[1] 0.109375\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.02148438\n", "[1] 0.34375\n", "[1] 0.34375\n", "[1] 1\n", "[1] 1\n", "[1] 1\n", "[1] 0.34375\n", "[1] 0.7539063\n", "[1] 0.7539063\n" ] }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Suppose we toss 100 fair coins\n", "count.reject=0\n", "for (i in 1:100){\n", " x2=sample(c(\"H\",\"T\"),10,replace=TRUE,prob=c(1/2,1/2))\n", " result=binom.test(sum(x2=='T'), n=length(x2), p = 0.5)\n", " print(result$p.value)\n", " if (result$p.value<.05) {count.reject=count.reject+1}\n", " }\n", "count.reject" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can consider the number of tosses to be the 'sample size' and the number of hypotheses tested to be the number of coins tossed. Here, our sample size is small compared to the number of hypotheses tested. In genome data, the sample size is the number of patients and the number of hypotheses tested is the number of genes (or SNPs) analyzed. If we set a significance level of $0.05$, we are saying that we expect to find a false positive 5% of the time. So in our coin toss, we would expect to find the coin to be fair 5 times out of 100." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# What happens if we increase the number of tosses?\n" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.2.3" } }, "nbformat": 4, "nbformat_minor": 0 }