{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using `dplyr` for data manipulation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Description**\n", "\n", "```\n", "dplyr provides a flexible grammar of data manipulation. It’s the next iteration of plyr, focused on tools for working with data frames (hence the d in the name).\n", "```\n", "\n", "If you look at [`dplyr` docs](https://cran.r-project.org/web/packages/dplyr/dplyr.pdf), there is a rich collection of data manipulaiton verbs provided. However, most common tasks can be accomplished with just 6 verbs that we will cover in this session:\n", "\n", "```\n", "select\n", "filter\n", "mutate a\n", "arrange\n", "summarize\n", "group_by\n", "```\n", "\n", "We will also see how to construct data manipulation \"sentnces\" by using these versb togetehr wtih `pipes`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──\n", "✔ ggplot2 2.2.1 ✔ purrr 0.2.5\n", "✔ tibble 1.4.2 ✔ dplyr 0.7.5\n", "✔ tidyr 0.8.1 ✔ stringr 1.3.1\n", "✔ readr 1.1.1 ✔ forcats 0.3.0\n", "── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n", "✖ dplyr::filter() masks stats::filter()\n", "✖ dplyr::lag() masks stats::lag()\n" ] } ], "source": [ "library(tidyverse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "path='../josh/info/2018_pilot_metadata.tsv'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Parsed with column specification:\n", "cols(\n", " Label = col_character(),\n", " RNA_sample_num = col_integer(),\n", " Media = col_character(),\n", " Strain = col_character(),\n", " Replicate = col_integer(),\n", " experiment_person = col_character(),\n", " libprep_person = col_character(),\n", " enrichment_method = col_character(),\n", " RIN = col_double(),\n", " concentration_fold_difference = col_double(),\n", " `i7 index` = col_character(),\n", " `i5 index` = col_character(),\n", " `i5 primer` = col_character(),\n", " `i7 primer` = col_character(),\n", " `library#` = col_integer()\n", ")\n" ] } ], "source": [ "df <- read_tsv(path)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'Label'
  2. \n", "\t
  3. 'RNA_sample_num'
  4. \n", "\t
  5. 'Media'
  6. \n", "\t
  7. 'Strain'
  8. \n", "\t
  9. 'Replicate'
  10. \n", "\t
  11. 'experiment_person'
  12. \n", "\t
  13. 'libprep_person'
  14. \n", "\t
  15. 'enrichment_method'
  16. \n", "\t
  17. 'RIN'
  18. \n", "\t
  19. 'concentration_fold_difference'
  20. \n", "\t
  21. 'i7 index'
  22. \n", "\t
  23. 'i5 index'
  24. \n", "\t
  25. 'i5 primer'
  26. \n", "\t
  27. 'i7 primer'
  28. \n", "\t
  29. 'library#'
  30. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'Label'\n", "\\item 'RNA\\_sample\\_num'\n", "\\item 'Media'\n", "\\item 'Strain'\n", "\\item 'Replicate'\n", "\\item 'experiment\\_person'\n", "\\item 'libprep\\_person'\n", "\\item 'enrichment\\_method'\n", "\\item 'RIN'\n", "\\item 'concentration\\_fold\\_difference'\n", "\\item 'i7 index'\n", "\\item 'i5 index'\n", "\\item 'i5 primer'\n", "\\item 'i7 primer'\n", "\\item 'library\\#'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'Label'\n", "2. 'RNA_sample_num'\n", "3. 'Media'\n", "4. 'Strain'\n", "5. 'Replicate'\n", "6. 'experiment_person'\n", "7. 'libprep_person'\n", "8. 'enrichment_method'\n", "9. 'RIN'\n", "10. 'concentration_fold_difference'\n", "11. 'i7 index'\n", "12. 'i5 index'\n", "13. 'i5 primer'\n", "14. 'i7 primer'\n", "15. 'library#'\n", "\n", "\n" ], "text/plain": [ " [1] \"Label\" \"RNA_sample_num\" \n", " [3] \"Media\" \"Strain\" \n", " [5] \"Replicate\" \"experiment_person\" \n", " [7] \"libprep_person\" \"enrichment_method\" \n", " [9] \"RIN\" \"concentration_fold_difference\"\n", "[11] \"i7 index\" \"i5 index\" \n", "[13] \"i5 primer\" \"i7 primer\" \n", "[15] \"library#\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "names(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fix nmaes to be consistent\n", "\n", "Note that some names use spaces between words and others use underscores. Le'ts finx this." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "names(df) <- str_replace_all(names(df), c('[:space:]+' = '_'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'Label'
  2. \n", "\t
  3. 'RNA_sample_num'
  4. \n", "\t
  5. 'Media'
  6. \n", "\t
  7. 'Strain'
  8. \n", "\t
  9. 'Replicate'
  10. \n", "\t
  11. 'experiment_person'
  12. \n", "\t
  13. 'libprep_person'
  14. \n", "\t
  15. 'enrichment_method'
  16. \n", "\t
  17. 'RIN'
  18. \n", "\t
  19. 'concentration_fold_difference'
  20. \n", "\t
  21. 'i7_index'
  22. \n", "\t
  23. 'i5_index'
  24. \n", "\t
  25. 'i5_primer'
  26. \n", "\t
  27. 'i7_primer'
  28. \n", "\t
  29. 'library#'
  30. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'Label'\n", "\\item 'RNA\\_sample\\_num'\n", "\\item 'Media'\n", "\\item 'Strain'\n", "\\item 'Replicate'\n", "\\item 'experiment\\_person'\n", "\\item 'libprep\\_person'\n", "\\item 'enrichment\\_method'\n", "\\item 'RIN'\n", "\\item 'concentration\\_fold\\_difference'\n", "\\item 'i7\\_index'\n", "\\item 'i5\\_index'\n", "\\item 'i5\\_primer'\n", "\\item 'i7\\_primer'\n", "\\item 'library\\#'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'Label'\n", "2. 'RNA_sample_num'\n", "3. 'Media'\n", "4. 'Strain'\n", "5. 'Replicate'\n", "6. 'experiment_person'\n", "7. 'libprep_person'\n", "8. 'enrichment_method'\n", "9. 'RIN'\n", "10. 'concentration_fold_difference'\n", "11. 'i7_index'\n", "12. 'i5_index'\n", "13. 'i5_primer'\n", "14. 'i7_primer'\n", "15. 'library#'\n", "\n", "\n" ], "text/plain": [ " [1] \"Label\" \"RNA_sample_num\" \n", " [3] \"Media\" \"Strain\" \n", " [5] \"Replicate\" \"experiment_person\" \n", " [7] \"libprep_person\" \"enrichment_method\" \n", " [9] \"RIN\" \"concentration_fold_difference\"\n", "[11] \"i7_index\" \"i5_index\" \n", "[13] \"i5_primer\" \"i7_primer\" \n", "[15] \"library#\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "names(df)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 51
  2. \n", "\t
  3. 15
  4. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 51\n", "\\item 15\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 51\n", "2. 15\n", "\n", "\n" ], "text/plain": [ "[1] 51 15" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dim(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: Drop some columns so table fits in browser" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "df <- df[, c(2:5, 8:15)]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 51
  2. \n", "\t
  3. 12
  4. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 51\n", "\\item 12\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 51\n", "2. 12\n", "\n", "\n" ], "text/plain": [ "[1] 51 12" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dim(df)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
27 YPD H99 9 RZ 10.0 3.57 GAATTCGTTCAGAGCCi504 i706 46
26 YPD H99 8 MA 10.0 2.76 ATTACTCGGTCAGTACi508 i701 8
36 YPD mar1d 12 MA 9.7 3.70 CGCTCATTACGTCCTGi507 i703 24
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 27 & YPD & H99 & 9 & RZ & 10.0 & 3.57 & GAATTCGT & TCAGAGCC & i504 & i706 & 46 \\\\\n", "\t 26 & YPD & H99 & 8 & MA & 10.0 & 2.76 & ATTACTCG & GTCAGTAC & i508 & i701 & 8 \\\\\n", "\t 36 & YPD & mar1d & 12 & MA & 9.7 & 3.70 & CGCTCATT & ACGTCCTG & i507 & i703 & 24 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 27 | YPD | H99 | 9 | RZ | 10.0 | 3.57 | GAATTCGT | TCAGAGCC | i504 | i706 | 46 | \n", "| 26 | YPD | H99 | 8 | MA | 10.0 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 | \n", "| 36 | YPD | mar1d | 12 | MA | 9.7 | 3.70 | CGCTCATT | ACGTCCTG | i507 | i703 | 24 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 27 YPD H99 9 RZ 10.0\n", "2 26 YPD H99 8 MA 10.0\n", "3 36 YPD mar1d 12 MA 9.7\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 3.57 GAATTCGT TCAGAGCC i504 i706 46 \n", "2 2.76 ATTACTCG GTCAGTAC i508 i701 8 \n", "3 3.70 CGCTCATT ACGTCCTG i507 i703 24 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sample_n(df, 3)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " RNA_sample_num Media Strain Replicate \n", " Min. : 1.00 Length:51 Length:51 Min. : 1.000 \n", " 1st Qu.: 9.50 Class :character Class :character 1st Qu.: 3.000 \n", " Median :16.00 Mode :character Mode :character Median : 4.000 \n", " Mean :19.55 Mean : 5.431 \n", " 3rd Qu.:27.00 3rd Qu.: 8.000 \n", " Max. :47.00 Max. :12.000 \n", " enrichment_method RIN concentration_fold_difference\n", " Length:51 Min. : 6.200 Min. :1.340 \n", " Class :character 1st Qu.: 9.900 1st Qu.:2.010 \n", " Mode :character Median :10.000 Median :2.850 \n", " Mean : 9.776 Mean :2.892 \n", " 3rd Qu.:10.000 3rd Qu.:3.640 \n", " Max. :10.000 Max. :5.530 \n", " i7_index i5_index i5_primer i7_primer \n", " Length:51 Length:51 Length:51 Length:51 \n", " Class :character Class :character Class :character Class :character \n", " Mode :character Mode :character Mode :character Mode :character \n", " \n", " \n", " \n", " library# \n", " Min. : 1.0 \n", " 1st Qu.:13.5 \n", " Median :26.0 \n", " Mean :26.0 \n", " 3rd Qu.:38.5 \n", " Max. :51.0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "summary(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fixing fake numeric columns\n", "\n", "Note that `RNA_sample_num`, `Replicate` and `library#` are really discrete `factors` rather than numbrs. As `#` is not allowed in an R variable name, we need to use backticks for `library#`. (Alternatively, we cna reame to something like `library_num`). " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'RNA_sample_num'
  2. \n", "\t
  3. 'Replicate'
  4. \n", "\t
  5. 'library#'
  6. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'RNA\\_sample\\_num'\n", "\\item 'Replicate'\n", "\\item 'library\\#'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'RNA_sample_num'\n", "2. 'Replicate'\n", "3. 'library#'\n", "\n", "\n" ], "text/plain": [ "[1] \"RNA_sample_num\" \"Replicate\" \"library#\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% mutate(\n", " RNA_sample_num=factor(RNA_sample_num), \n", " Replicate=factor(Replicate),\n", " `library#`=factor(`library#`)\n", ") %>% \n", "select_if(is.factor) %>%\n", "names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After checking that the transformation worked, we can save the transformed data.frame." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "df <- df %>% mutate(\n", " RNA_sample_num=factor(RNA_sample_num), \n", " Replicate=factor(Replicate),\n", " `library#`=factor(`library#`)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Pipe" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
26 YPD H99 8 MA 10 2.76 ATTACTCGGTCAGTACi508 i701 8
2 YPD H99 2 RZ 10 1.34 TCCGGAGAAGGCTATAi501 i702 9
9 YPD mar1d 3 RZ 10 2.23 TCCGGAGAGCCTCTATi502 i702 10
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 26 & YPD & H99 & 8 & MA & 10 & 2.76 & ATTACTCG & GTCAGTAC & i508 & i701 & 8 \\\\\n", "\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n", "\t 9 & YPD & mar1d & 3 & RZ & 10 & 2.23 & TCCGGAGA & GCCTCTAT & i502 & i702 & 10 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 26 | YPD | H99 | 8 | MA | 10 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 | \n", "| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n", "| 9 | YPD | mar1d | 3 | RZ | 10 | 2.23 | TCCGGAGA | GCCTCTAT | i502 | i702 | 10 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN\n", "1 26 YPD H99 8 MA 10 \n", "2 2 YPD H99 2 RZ 10 \n", "3 9 YPD mar1d 3 RZ 10 \n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 2.76 ATTACTCG GTCAGTAC i508 i701 8 \n", "2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n", "3 2.23 TCCGGAGA GCCTCTAT i502 i702 10 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% head(10) %>% tail(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the above result can also be achieved with `slice`" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
26 YPD H99 8 MA 10 2.76 ATTACTCGGTCAGTACi508 i701 8
2 YPD H99 2 RZ 10 1.34 TCCGGAGAAGGCTATAi501 i702 9
9 YPD mar1d 3 RZ 10 2.23 TCCGGAGAGCCTCTATi502 i702 10
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 26 & YPD & H99 & 8 & MA & 10 & 2.76 & ATTACTCG & GTCAGTAC & i508 & i701 & 8 \\\\\n", "\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n", "\t 9 & YPD & mar1d & 3 & RZ & 10 & 2.23 & TCCGGAGA & GCCTCTAT & i502 & i702 & 10 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 26 | YPD | H99 | 8 | MA | 10 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 | \n", "| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n", "| 9 | YPD | mar1d | 3 | RZ | 10 | 2.23 | TCCGGAGA | GCCTCTAT | i502 | i702 | 10 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN\n", "1 26 YPD H99 8 MA 10 \n", "2 2 YPD H99 2 RZ 10 \n", "3 9 YPD mar1d 3 RZ 10 \n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 2.76 ATTACTCG GTCAGTAC i508 i701 8 \n", "2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n", "3 2.23 TCCGGAGA GCCTCTAT i502 i702 10 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% slice(8:10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Select columns" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrain
2 YPD H99
9 YPD mar1d
10 YPD mar1d
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " RNA\\_sample\\_num & Media & Strain\\\\\n", "\\hline\n", "\t 2 & YPD & H99 \\\\\n", "\t 9 & YPD & mar1d\\\\\n", "\t 10 & YPD & mar1d\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | \n", "|---|---|---|\n", "| 2 | YPD | H99 | \n", "| 9 | YPD | mar1d | \n", "| 10 | YPD | mar1d | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain\n", "1 2 YPD H99 \n", "2 9 YPD mar1d \n", "3 10 YPD mar1d " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(1:3) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numStrainenrichment_method
2 H99 MA
9 mar1dMA
10 mar1dMA
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " RNA\\_sample\\_num & Strain & enrichment\\_method\\\\\n", "\\hline\n", "\t 2 & H99 & MA \\\\\n", "\t 9 & mar1d & MA \\\\\n", "\t 10 & mar1d & MA \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Strain | enrichment_method | \n", "|---|---|---|\n", "| 2 | H99 | MA | \n", "| 9 | mar1d | MA | \n", "| 10 | mar1d | MA | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Strain enrichment_method\n", "1 2 H99 MA \n", "2 9 mar1d MA \n", "3 10 mar1d MA " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(c(1,3,5)) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMedia
2 YPD
9 YPD
10 YPD
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " RNA\\_sample\\_num & Media\\\\\n", "\\hline\n", "\t 2 & YPD\\\\\n", "\t 9 & YPD\\\\\n", "\t 10 & YPD\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | \n", "|---|---|---|\n", "| 2 | YPD | \n", "| 9 | YPD | \n", "| 10 | YPD | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media\n", "1 2 YPD \n", "2 9 YPD \n", "3 10 YPD " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(c('RNA_sample_num', 'Media')) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dropping columns" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
i7_primerlibrary#
i7011
i7012
i7013
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t i701 & 1 \\\\\n", "\t i701 & 2 \\\\\n", "\t i701 & 3 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "i7_primer | library# | \n", "|---|---|---|\n", "| i701 | 1 | \n", "| i701 | 2 | \n", "| i701 | 3 | \n", "\n", "\n" ], "text/plain": [ " i7_primer library#\n", "1 i701 1 \n", "2 i701 2 \n", "3 i701 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(-((1:10))) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting using string operations" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/latex": [], "text/markdown": [], "text/plain": [ " \n", "1\n", "2\n", "3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(ends_with('person')) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
i7_indexi5_indexi5_primeri7_primer
ATTACTCGAGGCTATAi501 i701
ATTACTCGGCCTCTATi502 i701
ATTACTCGAGGATAGGi503 i701
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " i7\\_index & i5\\_index & i5\\_primer & i7\\_primer\\\\\n", "\\hline\n", "\t ATTACTCG & AGGCTATA & i501 & i701 \\\\\n", "\t ATTACTCG & GCCTCTAT & i502 & i701 \\\\\n", "\t ATTACTCG & AGGATAGG & i503 & i701 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "i7_index | i5_index | i5_primer | i7_primer | \n", "|---|---|---|\n", "| ATTACTCG | AGGCTATA | i501 | i701 | \n", "| ATTACTCG | GCCTCTAT | i502 | i701 | \n", "| ATTACTCG | AGGATAGG | i503 | i701 | \n", "\n", "\n" ], "text/plain": [ " i7_index i5_index i5_primer i7_primer\n", "1 ATTACTCG AGGCTATA i501 i701 \n", "2 ATTACTCG GCCTCTAT i502 i701 \n", "3 ATTACTCG AGGATAGG i503 i701 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(starts_with('i')) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
i5_primeri7_primer
i501i701
i502i701
i503i701
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " i5\\_primer & i7\\_primer\\\\\n", "\\hline\n", "\t i501 & i701\\\\\n", "\t i502 & i701\\\\\n", "\t i503 & i701\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "i5_primer | i7_primer | \n", "|---|---|---|\n", "| i501 | i701 | \n", "| i502 | i701 | \n", "| i503 | i701 | \n", "\n", "\n" ], "text/plain": [ " i5_primer i7_primer\n", "1 i501 i701 \n", "2 i502 i701 \n", "3 i503 i701 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(contains('primer')) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numenrichment_methodconcentration_fold_differencei7_indexi5_indexi5_primeri7_primer
2 MA 1.34 ATTACTCGAGGCTATAi501 i701
9 MA 2.23 ATTACTCGGCCTCTATi502 i701
10 MA 4.37 ATTACTCGAGGATAGGi503 i701
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllll}\n", " RNA\\_sample\\_num & enrichment\\_method & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer\\\\\n", "\\hline\n", "\t 2 & MA & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 \\\\\n", "\t 9 & MA & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 \\\\\n", "\t 10 & MA & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | enrichment_method | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | \n", "|---|---|---|\n", "| 2 | MA | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | \n", "| 9 | MA | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | \n", "| 10 | MA | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num enrichment_method concentration_fold_difference i7_index\n", "1 2 MA 1.34 ATTACTCG\n", "2 9 MA 2.23 ATTACTCG\n", "3 10 MA 4.37 ATTACTCG\n", " i5_index i5_primer i7_primer\n", "1 AGGCTATA i501 i701 \n", "2 GCCTCTAT i502 i701 \n", "3 AGGATAGG i503 i701 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select(matches('.*_.*')) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Renaming columns with select" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
methodfold.change
MA 1.34
MA 2.23
MA 4.37
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " method & fold.change\\\\\n", "\\hline\n", "\t MA & 1.34\\\\\n", "\t MA & 2.23\\\\\n", "\t MA & 4.37\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "method | fold.change | \n", "|---|---|---|\n", "| MA | 1.34 | \n", "| MA | 2.23 | \n", "| MA | 4.37 | \n", "\n", "\n" ], "text/plain": [ " method fold.change\n", "1 MA 1.34 \n", "2 MA 2.23 \n", "3 MA 4.37 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "select(c('method' = 'enrichment_method', \n", " 'fold.change' = 'concentration_fold_difference')) %>% \n", "head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scoped variants" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RINconcentration_fold_difference
10.01.34
10.02.23
9.94.37
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " RIN & concentration\\_fold\\_difference\\\\\n", "\\hline\n", "\t 10.0 & 1.34\\\\\n", "\t 10.0 & 2.23\\\\\n", "\t 9.9 & 4.37\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RIN | concentration_fold_difference | \n", "|---|---|---|\n", "| 10.0 | 1.34 | \n", "| 10.0 | 2.23 | \n", "| 9.9 | 4.37 | \n", "\n", "\n" ], "text/plain": [ " RIN concentration_fold_difference\n", "1 10.0 1.34 \n", "2 10.0 2.23 \n", "3 9.9 4.37 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select_if(is.numeric) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
ENRICHMENT_METHODCONCENTRATION_FOLD_DIFFERENCE
MA 1.34
MA 2.23
MA 4.37
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " ENRICHMENT\\_METHOD & CONCENTRATION\\_FOLD\\_DIFFERENCE\\\\\n", "\\hline\n", "\t MA & 1.34\\\\\n", "\t MA & 2.23\\\\\n", "\t MA & 4.37\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "ENRICHMENT_METHOD | CONCENTRATION_FOLD_DIFFERENCE | \n", "|---|---|---|\n", "| MA | 1.34 | \n", "| MA | 2.23 | \n", "| MA | 4.37 | \n", "\n", "\n" ], "text/plain": [ " ENRICHMENT_METHOD CONCENTRATION_FOLD_DIFFERENCE\n", "1 MA 1.34 \n", "2 MA 2.23 \n", "3 MA 4.37 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "select_at(c('enrichment_method', \n", " 'concentration_fold_difference'), toupper) %>%\n", "head(3)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
rna_sample_nummediastrainreplicateenrichment_methodrinconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
2 YPD H99 2 MA 10.0 1.34 ATTACTCGAGGCTATAi501 i701 1
9 YPD mar1d 3 MA 10.0 2.23 ATTACTCGGCCTCTATi502 i701 2
10 YPD mar1d 4 MA 9.9 4.37 ATTACTCGAGGATAGGi503 i701 3
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " rna\\_sample\\_num & media & strain & replicate & enrichment\\_method & rin & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 2 & YPD & H99 & 2 & MA & 10.0 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n", "\t 9 & YPD & mar1d & 3 & MA & 10.0 & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 & 2 \\\\\n", "\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "rna_sample_num | media | strain | replicate | enrichment_method | rin | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n", "| 9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 | \n", "| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n", "\n", "\n" ], "text/plain": [ " rna_sample_num media strain replicate enrichment_method rin \n", "1 2 YPD H99 2 MA 10.0\n", "2 9 YPD mar1d 3 MA 10.0\n", "3 10 YPD mar1d 4 MA 9.9\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n", "2 2.23 ATTACTCG GCCTCTAT i502 i701 2 \n", "3 4.37 ATTACTCG AGGATAGG i503 i701 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% select_all(tolower) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Filter rows" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'YPD'
  2. \n", "\t
  3. 'TC'
  4. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'YPD'\n", "\\item 'TC'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'YPD'\n", "2. 'TC'\n", "\n", "\n" ], "text/plain": [ "[1] \"YPD\" \"TC\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "unique(df$Media)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Equality and inequality conditions" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
14 TC H99 2 MA 10.0 1.57 ATTACTCGTCAGAGCCi504 i701 4
15 TC H99 3 MA 9.9 2.85 ATTACTCGCTTCGCCTi505 i701 5
21 TC mar1d 3 MA 10.0 1.81 ATTACTCGTAAGATTAi506 i701 6
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 14 & TC & H99 & 2 & MA & 10.0 & 1.57 & ATTACTCG & TCAGAGCC & i504 & i701 & 4 \\\\\n", "\t 15 & TC & H99 & 3 & MA & 9.9 & 2.85 & ATTACTCG & CTTCGCCT & i505 & i701 & 5 \\\\\n", "\t 21 & TC & mar1d & 3 & MA & 10.0 & 1.81 & ATTACTCG & TAAGATTA & i506 & i701 & 6 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 14 | TC | H99 | 2 | MA | 10.0 | 1.57 | ATTACTCG | TCAGAGCC | i504 | i701 | 4 | \n", "| 15 | TC | H99 | 3 | MA | 9.9 | 2.85 | ATTACTCG | CTTCGCCT | i505 | i701 | 5 | \n", "| 21 | TC | mar1d | 3 | MA | 10.0 | 1.81 | ATTACTCG | TAAGATTA | i506 | i701 | 6 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 14 TC H99 2 MA 10.0\n", "2 15 TC H99 3 MA 9.9\n", "3 21 TC mar1d 3 MA 10.0\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 1.57 ATTACTCG TCAGAGCC i504 i701 4 \n", "2 2.85 ATTACTCG CTTCGCCT i505 i701 5 \n", "3 1.81 ATTACTCG TAAGATTA i506 i701 6 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% filter(Media == 'TC') %>% head(3)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
10 YPD mar1d 4 MA 9.9 4.37 ATTACTCGAGGATAGGi503 i701 3
10 YPD mar1d 4 RZ 9.9 4.37 TCCGGAGAAGGATAGGi503 i702 11
1 YPD H99 1 MA 10.0 3.64 CGCTCATTAGGCTATAi501 i703 18
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n", "\t 10 & YPD & mar1d & 4 & RZ & 9.9 & 4.37 & TCCGGAGA & AGGATAGG & i503 & i702 & 11 \\\\\n", "\t 1 & YPD & H99 & 1 & MA & 10.0 & 3.64 & CGCTCATT & AGGCTATA & i501 & i703 & 18 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n", "| 10 | YPD | mar1d | 4 | RZ | 9.9 | 4.37 | TCCGGAGA | AGGATAGG | i503 | i702 | 11 | \n", "| 1 | YPD | H99 | 1 | MA | 10.0 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 10 YPD mar1d 4 MA 9.9\n", "2 10 YPD mar1d 4 RZ 9.9\n", "3 1 YPD H99 1 MA 10.0\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 4.37 ATTACTCG AGGATAGG i503 i701 3 \n", "2 4.37 TCCGGAGA AGGATAGG i503 i702 11 \n", "3 3.64 CGCTCATT AGGCTATA i501 i703 18 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% filter(concentration_fold_difference > 3) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining conditions" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
10 YPD mar1d 4 MA 9.9 4.37 ATTACTCGAGGATAGGi503 i701 3
10 YPD mar1d 4 RZ 9.9 4.37 TCCGGAGAAGGATAGGi503 i702 11
1 YPD H99 1 MA 10.0 3.64 CGCTCATTAGGCTATAi501 i703 18
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n", "\t 10 & YPD & mar1d & 4 & RZ & 9.9 & 4.37 & TCCGGAGA & AGGATAGG & i503 & i702 & 11 \\\\\n", "\t 1 & YPD & H99 & 1 & MA & 10.0 & 3.64 & CGCTCATT & AGGCTATA & i501 & i703 & 18 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n", "| 10 | YPD | mar1d | 4 | RZ | 9.9 | 4.37 | TCCGGAGA | AGGATAGG | i503 | i702 | 11 | \n", "| 1 | YPD | H99 | 1 | MA | 10.0 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 10 YPD mar1d 4 MA 9.9\n", "2 10 YPD mar1d 4 RZ 9.9\n", "3 1 YPD H99 1 MA 10.0\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 4.37 ATTACTCG AGGATAGG i503 i701 3 \n", "2 4.37 TCCGGAGA AGGATAGG i503 i702 11 \n", "3 3.64 CGCTCATT AGGCTATA i501 i703 18 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "filter(Media != 'TC', \n", " concentration_fold_difference > 3) %>%\n", "head(3)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
10 YPD mar1d 4 MA 9.9 4.37 ATTACTCGAGGATAGGi503 i701 3
14 TC H99 2 MA 10.0 1.57 ATTACTCGTCAGAGCCi504 i701 4
15 TC H99 3 MA 9.9 2.85 ATTACTCGCTTCGCCTi505 i701 5
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n", "\t 14 & TC & H99 & 2 & MA & 10.0 & 1.57 & ATTACTCG & TCAGAGCC & i504 & i701 & 4 \\\\\n", "\t 15 & TC & H99 & 3 & MA & 9.9 & 2.85 & ATTACTCG & CTTCGCCT & i505 & i701 & 5 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n", "| 14 | TC | H99 | 2 | MA | 10.0 | 1.57 | ATTACTCG | TCAGAGCC | i504 | i701 | 4 | \n", "| 15 | TC | H99 | 3 | MA | 9.9 | 2.85 | ATTACTCG | CTTCGCCT | i505 | i701 | 5 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 10 YPD mar1d 4 MA 9.9\n", "2 14 TC H99 2 MA 10.0\n", "3 15 TC H99 3 MA 9.9\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 4.37 ATTACTCG AGGATAGG i503 i701 3 \n", "2 1.57 ATTACTCG TCAGAGCC i504 i701 4 \n", "3 2.85 ATTACTCG CTTCGCCT i505 i701 5 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "filter(Media == 'TC' |\n", " concentration_fold_difference > 3) %>%\n", "head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filtering on string conditions" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
2 YPD H99 2 MA 10.0 1.34 ATTACTCGAGGCTATAi501 i701 1
9 YPD mar1d 3 MA 10.0 2.23 ATTACTCGGCCTCTATi502 i701 2
10 YPD mar1d 4 MA 9.9 4.37 ATTACTCGAGGATAGGi503 i701 3
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 2 & YPD & H99 & 2 & MA & 10.0 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n", "\t 9 & YPD & mar1d & 3 & MA & 10.0 & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 & 2 \\\\\n", "\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n", "| 9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 | \n", "| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 2 YPD H99 2 MA 10.0\n", "2 9 YPD mar1d 3 MA 10.0\n", "3 10 YPD mar1d 4 MA 9.9\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n", "2 2.23 ATTACTCG GCCTCTAT i502 i701 2 \n", "3 4.37 ATTACTCG AGGATAGG i503 i701 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% filter(str_length(Media) == 3) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
2 YPD H99 2 MA 10.0 1.34 ATTACTCGAGGCTATAi501 i701 1
9 YPD mar1d 3 MA 10.0 2.23 ATTACTCGGCCTCTATi502 i701 2
10 YPD mar1d 4 MA 9.9 4.37 ATTACTCGAGGATAGGi503 i701 3
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 2 & YPD & H99 & 2 & MA & 10.0 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n", "\t 9 & YPD & mar1d & 3 & MA & 10.0 & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 & 2 \\\\\n", "\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n", "| 9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 | \n", "| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 2 YPD H99 2 MA 10.0\n", "2 9 YPD mar1d 3 MA 10.0\n", "3 10 YPD mar1d 4 MA 9.9\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n", "2 2.23 ATTACTCG GCCTCTAT i502 i701 2 \n", "3 4.37 ATTACTCG AGGATAGG i503 i701 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% filter(str_detect(i7_index, '^A.+')) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Arrange in ascending or descening order" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
2 YPD H99 2 MA 10 1.34 ATTACTCGAGGCTATAi501 i701 1
2 YPD H99 2 RZ 10 1.34 TCCGGAGAAGGCTATAi501 i702 9
2 YPD H99 2 TOT 10 1.34 CTGAAGCTAGGCTATAi501 i707 17
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 2 & YPD & H99 & 2 & MA & 10 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n", "\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n", "\t 2 & YPD & H99 & 2 & TOT & 10 & 1.34 & CTGAAGCT & AGGCTATA & i501 & i707 & 17 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 2 | YPD | H99 | 2 | MA | 10 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n", "| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n", "| 2 | YPD | H99 | 2 | TOT | 10 | 1.34 | CTGAAGCT | AGGCTATA | i501 | i707 | 17 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN\n", "1 2 YPD H99 2 MA 10 \n", "2 2 YPD H99 2 RZ 10 \n", "3 2 YPD H99 2 TOT 10 \n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n", "2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n", "3 1.34 CTGAAGCT AGGCTATA i501 i707 17 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% arrange(concentration_fold_difference) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
24 TC mar1d 6 MA 10.0 5.53 CGCTCATTTAAGATTAi506 i703 23
24 TC mar1d 6 RZ 10.0 5.53 GAGATTCCTAAGATTAi506 i704 31
23 TC mar1d 5 MA 9.9 4.47 CGCTCATTCTTCGCCTi505 i703 22
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 24 & TC & mar1d & 6 & MA & 10.0 & 5.53 & CGCTCATT & TAAGATTA & i506 & i703 & 23 \\\\\n", "\t 24 & TC & mar1d & 6 & RZ & 10.0 & 5.53 & GAGATTCC & TAAGATTA & i506 & i704 & 31 \\\\\n", "\t 23 & TC & mar1d & 5 & MA & 9.9 & 4.47 & CGCTCATT & CTTCGCCT & i505 & i703 & 22 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 24 | TC | mar1d | 6 | MA | 10.0 | 5.53 | CGCTCATT | TAAGATTA | i506 | i703 | 23 | \n", "| 24 | TC | mar1d | 6 | RZ | 10.0 | 5.53 | GAGATTCC | TAAGATTA | i506 | i704 | 31 | \n", "| 23 | TC | mar1d | 5 | MA | 9.9 | 4.47 | CGCTCATT | CTTCGCCT | i505 | i703 | 22 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 24 TC mar1d 6 MA 10.0\n", "2 24 TC mar1d 6 RZ 10.0\n", "3 23 TC mar1d 5 MA 9.9\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 5.53 CGCTCATT TAAGATTA i506 i703 23 \n", "2 5.53 GAGATTCC TAAGATTA i506 i704 31 \n", "3 4.47 CGCTCATT CTTCGCCT i505 i703 22 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% arrange(desc(concentration_fold_difference)) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
1 YPD H99 1 MA 10 3.64 CGCTCATTAGGCTATAi501 i703 18
1 YPD H99 1 RZ 10 3.64 GAGATTCCAGGCTATAi501 i704 26
13 TC H99 1 MA 10 1.95 CGCTCATTTCAGAGCCi504 i703 21
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 1 & YPD & H99 & 1 & MA & 10 & 3.64 & CGCTCATT & AGGCTATA & i501 & i703 & 18 \\\\\n", "\t 1 & YPD & H99 & 1 & RZ & 10 & 3.64 & GAGATTCC & AGGCTATA & i501 & i704 & 26 \\\\\n", "\t 13 & TC & H99 & 1 & MA & 10 & 1.95 & CGCTCATT & TCAGAGCC & i504 & i703 & 21 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 1 | YPD | H99 | 1 | MA | 10 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 | \n", "| 1 | YPD | H99 | 1 | RZ | 10 | 3.64 | GAGATTCC | AGGCTATA | i501 | i704 | 26 | \n", "| 13 | TC | H99 | 1 | MA | 10 | 1.95 | CGCTCATT | TCAGAGCC | i504 | i703 | 21 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN\n", "1 1 YPD H99 1 MA 10 \n", "2 1 YPD H99 1 RZ 10 \n", "3 13 TC H99 1 MA 10 \n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 3.64 CGCTCATT AGGCTATA i501 i703 18 \n", "2 3.64 GAGATTCC AGGCTATA i501 i704 26 \n", "3 1.95 CGCTCATT TCAGAGCC i504 i703 21 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "arrange(Replicate, \n", " desc(concentration_fold_difference))%>% \n", "head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `top_n`" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
23 TC mar1d 5 MA 9.9 4.47 CGCTCATTCTTCGCCTi505 i703 22
24 TC mar1d 6 MA 10.0 5.53 CGCTCATTTAAGATTAi506 i703 23
23 TC mar1d 5 RZ 9.9 4.47 GAGATTCCCTTCGCCTi505 i704 30
24 TC mar1d 6 RZ 10.0 5.53 GAGATTCCTAAGATTAi506 i704 31
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 23 & TC & mar1d & 5 & MA & 9.9 & 4.47 & CGCTCATT & CTTCGCCT & i505 & i703 & 22 \\\\\n", "\t 24 & TC & mar1d & 6 & MA & 10.0 & 5.53 & CGCTCATT & TAAGATTA & i506 & i703 & 23 \\\\\n", "\t 23 & TC & mar1d & 5 & RZ & 9.9 & 4.47 & GAGATTCC & CTTCGCCT & i505 & i704 & 30 \\\\\n", "\t 24 & TC & mar1d & 6 & RZ & 10.0 & 5.53 & GAGATTCC & TAAGATTA & i506 & i704 & 31 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|---|\n", "| 23 | TC | mar1d | 5 | MA | 9.9 | 4.47 | CGCTCATT | CTTCGCCT | i505 | i703 | 22 | \n", "| 24 | TC | mar1d | 6 | MA | 10.0 | 5.53 | CGCTCATT | TAAGATTA | i506 | i703 | 23 | \n", "| 23 | TC | mar1d | 5 | RZ | 9.9 | 4.47 | GAGATTCC | CTTCGCCT | i505 | i704 | 30 | \n", "| 24 | TC | mar1d | 6 | RZ | 10.0 | 5.53 | GAGATTCC | TAAGATTA | i506 | i704 | 31 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN \n", "1 23 TC mar1d 5 MA 9.9\n", "2 24 TC mar1d 6 MA 10.0\n", "3 23 TC mar1d 5 RZ 9.9\n", "4 24 TC mar1d 6 RZ 10.0\n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 4.47 CGCTCATT CTTCGCCT i505 i703 22 \n", "2 5.53 CGCTCATT TAAGATTA i506 i703 23 \n", "3 4.47 GAGATTCC CTTCGCCT i505 i704 30 \n", "4 5.53 GAGATTCC TAAGATTA i506 i704 31 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% top_n(3, concentration_fold_difference)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numMediaStrainReplicateenrichment_methodRINconcentration_fold_differencei7_indexi5_indexi5_primeri7_primerlibrary#
2 YPD H99 2 MA 10 1.34 ATTACTCGAGGCTATAi501 i701 1
2 YPD H99 2 RZ 10 1.34 TCCGGAGAAGGCTATAi501 i702 9
2 YPD H99 2 TOT 10 1.34 CTGAAGCTAGGCTATAi501 i707 17
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n", "\\hline\n", "\t 2 & YPD & H99 & 2 & MA & 10 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n", "\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n", "\t 2 & YPD & H99 & 2 & TOT & 10 & 1.34 & CTGAAGCT & AGGCTATA & i501 & i707 & 17 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n", "|---|---|---|\n", "| 2 | YPD | H99 | 2 | MA | 10 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n", "| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n", "| 2 | YPD | H99 | 2 | TOT | 10 | 1.34 | CTGAAGCT | AGGCTATA | i501 | i707 | 17 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num Media Strain Replicate enrichment_method RIN\n", "1 2 YPD H99 2 MA 10 \n", "2 2 YPD H99 2 RZ 10 \n", "3 2 YPD H99 2 TOT 10 \n", " concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n", "1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n", "2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n", "3 1.34 CTGAAGCT AGGCTATA i501 i707 17 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% top_n(3, desc(concentration_fold_difference))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Mutate values" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
RNA_sample_numconcentration_fold_differenceconcentration_difference
2 1.34 2.531513
9 2.23 4.691340
10 4.37 20.677645
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " RNA\\_sample\\_num & concentration\\_fold\\_difference & concentration\\_difference\\\\\n", "\\hline\n", "\t 2 & 1.34 & 2.531513\\\\\n", "\t 9 & 2.23 & 4.691340\\\\\n", "\t 10 & 4.37 & 20.677645\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RNA_sample_num | concentration_fold_difference | concentration_difference | \n", "|---|---|---|\n", "| 2 | 1.34 | 2.531513 | \n", "| 9 | 2.23 | 4.691340 | \n", "| 10 | 4.37 | 20.677645 | \n", "\n", "\n" ], "text/plain": [ " RNA_sample_num concentration_fold_difference concentration_difference\n", "1 2 1.34 2.531513 \n", "2 9 2.23 4.691340 \n", "3 10 4.37 20.677645 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "select(RNA_sample_num, concentration_fold_difference) %>%\n", "mutate(concentration_difference=2^concentration_fold_difference) %>%\n", "head(3)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
concentration_difference
2.531513
4.691340
20.677645
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " concentration\\_difference\\\\\n", "\\hline\n", "\t 2.531513\\\\\n", "\t 4.691340\\\\\n", "\t 20.677645\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "concentration_difference | \n", "|---|---|---|\n", "| 2.531513 | \n", "| 4.691340 | \n", "| 20.677645 | \n", "\n", "\n" ], "text/plain": [ " concentration_difference\n", "1 2.531513 \n", "2 4.691340 \n", "3 20.677645 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>%\n", "transmute(concentration_difference=2^concentration_fold_difference) %>% \n", "head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Summarize" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
avg_fold_changemin_fold_changemax_fold_change
2.8919611.34 5.53
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " avg\\_fold\\_change & min\\_fold\\_change & max\\_fold\\_change\\\\\n", "\\hline\n", "\t 2.891961 & 1.34 & 5.53 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "avg_fold_change | min_fold_change | max_fold_change | \n", "|---|\n", "| 2.891961 | 1.34 | 5.53 | \n", "\n", "\n" ], "text/plain": [ " avg_fold_change min_fold_change max_fold_change\n", "1 2.891961 1.34 5.53 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% summarize(avg_fold_change=mean(concentration_fold_difference),\n", " min_fold_change=min(concentration_fold_difference),\n", " max_fold_change=max(concentration_fold_difference))" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
RINconcentration_fold_difference
9.7764712.891961
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " RIN & concentration\\_fold\\_difference\\\\\n", "\\hline\n", "\t 9.776471 & 2.891961\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "RIN | concentration_fold_difference | \n", "|---|\n", "| 9.776471 | 2.891961 | \n", "\n", "\n" ], "text/plain": [ " RIN concentration_fold_difference\n", "1 9.776471 2.891961 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% summarize_if(is.numeric, mean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Group_by\n", "\n", "`summariz`e is most useful when used with `group_by`" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
MediaStrainenrichment_methodmean_fold_diff
TC H99 MA 2.576667
TC H99 RZ 2.576667
TC mar1d MA 3.463333
TC mar1d RZ 3.463333
YPD H99 MA 2.556667
YPD H99 RZ 2.556667
YPD H99 TOT 1.790000
YPD mar1d MA 3.246667
YPD mar1d RZ 3.246667
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Media & Strain & enrichment\\_method & mean\\_fold\\_diff\\\\\n", "\\hline\n", "\t TC & H99 & MA & 2.576667\\\\\n", "\t TC & H99 & RZ & 2.576667\\\\\n", "\t TC & mar1d & MA & 3.463333\\\\\n", "\t TC & mar1d & RZ & 3.463333\\\\\n", "\t YPD & H99 & MA & 2.556667\\\\\n", "\t YPD & H99 & RZ & 2.556667\\\\\n", "\t YPD & H99 & TOT & 1.790000\\\\\n", "\t YPD & mar1d & MA & 3.246667\\\\\n", "\t YPD & mar1d & RZ & 3.246667\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Media | Strain | enrichment_method | mean_fold_diff | \n", "|---|---|---|---|---|---|---|---|---|\n", "| TC | H99 | MA | 2.576667 | \n", "| TC | H99 | RZ | 2.576667 | \n", "| TC | mar1d | MA | 3.463333 | \n", "| TC | mar1d | RZ | 3.463333 | \n", "| YPD | H99 | MA | 2.556667 | \n", "| YPD | H99 | RZ | 2.556667 | \n", "| YPD | H99 | TOT | 1.790000 | \n", "| YPD | mar1d | MA | 3.246667 | \n", "| YPD | mar1d | RZ | 3.246667 | \n", "\n", "\n" ], "text/plain": [ " Media Strain enrichment_method mean_fold_diff\n", "1 TC H99 MA 2.576667 \n", "2 TC H99 RZ 2.576667 \n", "3 TC mar1d MA 3.463333 \n", "4 TC mar1d RZ 3.463333 \n", "5 YPD H99 MA 2.556667 \n", "6 YPD H99 RZ 2.556667 \n", "7 YPD H99 TOT 1.790000 \n", "8 YPD mar1d MA 3.246667 \n", "9 YPD mar1d RZ 3.246667 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "group_by(Media, Strain, enrichment_method) %>%\n", "summarize(mean_fold_diff=mean(concentration_fold_difference))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
MediaStrainenrichment_methodRINconcentration_fold_difference
TC H99 MA 9.8500002.576667
TC H99 RZ 9.8500002.576667
TC mar1d MA 9.3333333.463333
TC mar1d RZ 9.3333333.463333
YPD H99 MA 10.0000002.556667
YPD H99 RZ 10.0000002.556667
YPD H99 TOT 10.0000001.790000
YPD mar1d MA 9.8666673.246667
YPD mar1d RZ 9.8666673.246667
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Media & Strain & enrichment\\_method & RIN & concentration\\_fold\\_difference\\\\\n", "\\hline\n", "\t TC & H99 & MA & 9.850000 & 2.576667 \\\\\n", "\t TC & H99 & RZ & 9.850000 & 2.576667 \\\\\n", "\t TC & mar1d & MA & 9.333333 & 3.463333 \\\\\n", "\t TC & mar1d & RZ & 9.333333 & 3.463333 \\\\\n", "\t YPD & H99 & MA & 10.000000 & 2.556667 \\\\\n", "\t YPD & H99 & RZ & 10.000000 & 2.556667 \\\\\n", "\t YPD & H99 & TOT & 10.000000 & 1.790000 \\\\\n", "\t YPD & mar1d & MA & 9.866667 & 3.246667 \\\\\n", "\t YPD & mar1d & RZ & 9.866667 & 3.246667 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Media | Strain | enrichment_method | RIN | concentration_fold_difference | \n", "|---|---|---|---|---|---|---|---|---|\n", "| TC | H99 | MA | 9.850000 | 2.576667 | \n", "| TC | H99 | RZ | 9.850000 | 2.576667 | \n", "| TC | mar1d | MA | 9.333333 | 3.463333 | \n", "| TC | mar1d | RZ | 9.333333 | 3.463333 | \n", "| YPD | H99 | MA | 10.000000 | 2.556667 | \n", "| YPD | H99 | RZ | 10.000000 | 2.556667 | \n", "| YPD | H99 | TOT | 10.000000 | 1.790000 | \n", "| YPD | mar1d | MA | 9.866667 | 3.246667 | \n", "| YPD | mar1d | RZ | 9.866667 | 3.246667 | \n", "\n", "\n" ], "text/plain": [ " Media Strain enrichment_method RIN concentration_fold_difference\n", "1 TC H99 MA 9.850000 2.576667 \n", "2 TC H99 RZ 9.850000 2.576667 \n", "3 TC mar1d MA 9.333333 3.463333 \n", "4 TC mar1d RZ 9.333333 3.463333 \n", "5 YPD H99 MA 10.000000 2.556667 \n", "6 YPD H99 RZ 10.000000 2.556667 \n", "7 YPD H99 TOT 10.000000 1.790000 \n", "8 YPD mar1d MA 9.866667 3.246667 \n", "9 YPD mar1d RZ 9.866667 3.246667 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "group_by(Media, Strain, enrichment_method) %>%\n", "summarize_if(is.numeric, mean) " ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
MediaStrainRIN_meanconcentration_fold_difference_meanRIN_sdconcentration_fold_difference_sd
TC H99 9.8500002.576667 0.26111650.7157873
TC mar1d 9.3333333.463333 1.46432321.3630803
YPD H99 10.0000002.403333 0.00000000.8593824
YPD mar1d 9.8666673.246667 0.16696940.7410230
\n" ], "text/latex": [ "\\begin{tabular}{r|llllll}\n", " Media & Strain & RIN\\_mean & concentration\\_fold\\_difference\\_mean & RIN\\_sd & concentration\\_fold\\_difference\\_sd\\\\\n", "\\hline\n", "\t TC & H99 & 9.850000 & 2.576667 & 0.2611165 & 0.7157873\\\\\n", "\t TC & mar1d & 9.333333 & 3.463333 & 1.4643232 & 1.3630803\\\\\n", "\t YPD & H99 & 10.000000 & 2.403333 & 0.0000000 & 0.8593824\\\\\n", "\t YPD & mar1d & 9.866667 & 3.246667 & 0.1669694 & 0.7410230\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Media | Strain | RIN_mean | concentration_fold_difference_mean | RIN_sd | concentration_fold_difference_sd | \n", "|---|---|---|---|\n", "| TC | H99 | 9.850000 | 2.576667 | 0.2611165 | 0.7157873 | \n", "| TC | mar1d | 9.333333 | 3.463333 | 1.4643232 | 1.3630803 | \n", "| YPD | H99 | 10.000000 | 2.403333 | 0.0000000 | 0.8593824 | \n", "| YPD | mar1d | 9.866667 | 3.246667 | 0.1669694 | 0.7410230 | \n", "\n", "\n" ], "text/plain": [ " Media Strain RIN_mean concentration_fold_difference_mean RIN_sd \n", "1 TC H99 9.850000 2.576667 0.2611165\n", "2 TC mar1d 9.333333 3.463333 1.4643232\n", "3 YPD H99 10.000000 2.403333 0.0000000\n", "4 YPD mar1d 9.866667 3.246667 0.1669694\n", " concentration_fold_difference_sd\n", "1 0.7157873 \n", "2 1.3630803 \n", "3 0.8593824 \n", "4 0.7410230 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "group_by(Media, Strain) %>%\n", "summarize_if(is.numeric, funs(mean, sd))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What experimental conditions produced the greatest mean fold change?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
MediaStrainenrichment_methodmean_fold_diff
TC mar1d MA 3.463333
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Media & Strain & enrichment\\_method & mean\\_fold\\_diff\\\\\n", "\\hline\n", "\t TC & mar1d & MA & 3.463333\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Media | Strain | enrichment_method | mean_fold_diff | \n", "|---|\n", "| TC | mar1d | MA | 3.463333 | \n", "\n", "\n" ], "text/plain": [ " Media Strain enrichment_method mean_fold_diff\n", "1 TC mar1d MA 3.463333 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df %>% \n", "group_by(Media, Strain, enrichment_method) %>%\n", "summarize(mean_fold_diff=mean(concentration_fold_difference)) %>%\n", "arrange(desc(mean_fold_diff)) %>%\n", "head(1)" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "r" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.4.4" } }, "nbformat": 4, "nbformat_minor": 2 }