{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using `dplyr` for data manipulation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Description**\n",
"\n",
"```\n",
"dplyr provides a flexible grammar of data manipulation. It’s the next iteration of plyr, focused on tools for working with data frames (hence the d in the name).\n",
"```\n",
"\n",
"If you look at [`dplyr` docs](https://cran.r-project.org/web/packages/dplyr/dplyr.pdf), there is a rich collection of data manipulaiton verbs provided. However, most common tasks can be accomplished with just 6 verbs that we will cover in this session:\n",
"\n",
"```\n",
"select\n",
"filter\n",
"mutate a\n",
"arrange\n",
"summarize\n",
"group_by\n",
"```\n",
"\n",
"We will also see how to construct data manipulation \"sentnces\" by using these versb togetehr wtih `pipes`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──\n",
"✔ ggplot2 2.2.1 ✔ purrr 0.2.5\n",
"✔ tibble 1.4.2 ✔ dplyr 0.7.5\n",
"✔ tidyr 0.8.1 ✔ stringr 1.3.1\n",
"✔ readr 1.1.1 ✔ forcats 0.3.0\n",
"── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n",
"✖ dplyr::filter() masks stats::filter()\n",
"✖ dplyr::lag() masks stats::lag()\n"
]
}
],
"source": [
"library(tidyverse)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"path='../josh/info/2018_pilot_metadata.tsv'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Parsed with column specification:\n",
"cols(\n",
" Label = col_character(),\n",
" RNA_sample_num = col_integer(),\n",
" Media = col_character(),\n",
" Strain = col_character(),\n",
" Replicate = col_integer(),\n",
" experiment_person = col_character(),\n",
" libprep_person = col_character(),\n",
" enrichment_method = col_character(),\n",
" RIN = col_double(),\n",
" concentration_fold_difference = col_double(),\n",
" `i7 index` = col_character(),\n",
" `i5 index` = col_character(),\n",
" `i5 primer` = col_character(),\n",
" `i7 primer` = col_character(),\n",
" `library#` = col_integer()\n",
")\n"
]
}
],
"source": [
"df <- read_tsv(path)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\t- 'Label'
\n",
"\t- 'RNA_sample_num'
\n",
"\t- 'Media'
\n",
"\t- 'Strain'
\n",
"\t- 'Replicate'
\n",
"\t- 'experiment_person'
\n",
"\t- 'libprep_person'
\n",
"\t- 'enrichment_method'
\n",
"\t- 'RIN'
\n",
"\t- 'concentration_fold_difference'
\n",
"\t- 'i7 index'
\n",
"\t- 'i5 index'
\n",
"\t- 'i5 primer'
\n",
"\t- 'i7 primer'
\n",
"\t- 'library#'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'Label'\n",
"\\item 'RNA\\_sample\\_num'\n",
"\\item 'Media'\n",
"\\item 'Strain'\n",
"\\item 'Replicate'\n",
"\\item 'experiment\\_person'\n",
"\\item 'libprep\\_person'\n",
"\\item 'enrichment\\_method'\n",
"\\item 'RIN'\n",
"\\item 'concentration\\_fold\\_difference'\n",
"\\item 'i7 index'\n",
"\\item 'i5 index'\n",
"\\item 'i5 primer'\n",
"\\item 'i7 primer'\n",
"\\item 'library\\#'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'Label'\n",
"2. 'RNA_sample_num'\n",
"3. 'Media'\n",
"4. 'Strain'\n",
"5. 'Replicate'\n",
"6. 'experiment_person'\n",
"7. 'libprep_person'\n",
"8. 'enrichment_method'\n",
"9. 'RIN'\n",
"10. 'concentration_fold_difference'\n",
"11. 'i7 index'\n",
"12. 'i5 index'\n",
"13. 'i5 primer'\n",
"14. 'i7 primer'\n",
"15. 'library#'\n",
"\n",
"\n"
],
"text/plain": [
" [1] \"Label\" \"RNA_sample_num\" \n",
" [3] \"Media\" \"Strain\" \n",
" [5] \"Replicate\" \"experiment_person\" \n",
" [7] \"libprep_person\" \"enrichment_method\" \n",
" [9] \"RIN\" \"concentration_fold_difference\"\n",
"[11] \"i7 index\" \"i5 index\" \n",
"[13] \"i5 primer\" \"i7 primer\" \n",
"[15] \"library#\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"names(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fix nmaes to be consistent\n",
"\n",
"Note that some names use spaces between words and others use underscores. Le'ts finx this."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"names(df) <- str_replace_all(names(df), c('[:space:]+' = '_'))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 'Label'
\n",
"\t- 'RNA_sample_num'
\n",
"\t- 'Media'
\n",
"\t- 'Strain'
\n",
"\t- 'Replicate'
\n",
"\t- 'experiment_person'
\n",
"\t- 'libprep_person'
\n",
"\t- 'enrichment_method'
\n",
"\t- 'RIN'
\n",
"\t- 'concentration_fold_difference'
\n",
"\t- 'i7_index'
\n",
"\t- 'i5_index'
\n",
"\t- 'i5_primer'
\n",
"\t- 'i7_primer'
\n",
"\t- 'library#'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'Label'\n",
"\\item 'RNA\\_sample\\_num'\n",
"\\item 'Media'\n",
"\\item 'Strain'\n",
"\\item 'Replicate'\n",
"\\item 'experiment\\_person'\n",
"\\item 'libprep\\_person'\n",
"\\item 'enrichment\\_method'\n",
"\\item 'RIN'\n",
"\\item 'concentration\\_fold\\_difference'\n",
"\\item 'i7\\_index'\n",
"\\item 'i5\\_index'\n",
"\\item 'i5\\_primer'\n",
"\\item 'i7\\_primer'\n",
"\\item 'library\\#'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'Label'\n",
"2. 'RNA_sample_num'\n",
"3. 'Media'\n",
"4. 'Strain'\n",
"5. 'Replicate'\n",
"6. 'experiment_person'\n",
"7. 'libprep_person'\n",
"8. 'enrichment_method'\n",
"9. 'RIN'\n",
"10. 'concentration_fold_difference'\n",
"11. 'i7_index'\n",
"12. 'i5_index'\n",
"13. 'i5_primer'\n",
"14. 'i7_primer'\n",
"15. 'library#'\n",
"\n",
"\n"
],
"text/plain": [
" [1] \"Label\" \"RNA_sample_num\" \n",
" [3] \"Media\" \"Strain\" \n",
" [5] \"Replicate\" \"experiment_person\" \n",
" [7] \"libprep_person\" \"enrichment_method\" \n",
" [9] \"RIN\" \"concentration_fold_difference\"\n",
"[11] \"i7_index\" \"i5_index\" \n",
"[13] \"i5_primer\" \"i7_primer\" \n",
"[15] \"library#\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"names(df)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 51
\n",
"\t- 15
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 51\n",
"\\item 15\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 51\n",
"2. 15\n",
"\n",
"\n"
],
"text/plain": [
"[1] 51 15"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dim(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: Drop some columns so table fits in browser"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"df <- df[, c(2:5, 8:15)]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 51
\n",
"\t- 12
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 51\n",
"\\item 12\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 51\n",
"2. 12\n",
"\n",
"\n"
],
"text/plain": [
"[1] 51 12"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dim(df)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t27 | YPD | H99 | 9 | RZ | 10.0 | 3.57 | GAATTCGT | TCAGAGCC | i504 | i706 | 46 |
\n",
"\t26 | YPD | H99 | 8 | MA | 10.0 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 |
\n",
"\t36 | YPD | mar1d | 12 | MA | 9.7 | 3.70 | CGCTCATT | ACGTCCTG | i507 | i703 | 24 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 27 & YPD & H99 & 9 & RZ & 10.0 & 3.57 & GAATTCGT & TCAGAGCC & i504 & i706 & 46 \\\\\n",
"\t 26 & YPD & H99 & 8 & MA & 10.0 & 2.76 & ATTACTCG & GTCAGTAC & i508 & i701 & 8 \\\\\n",
"\t 36 & YPD & mar1d & 12 & MA & 9.7 & 3.70 & CGCTCATT & ACGTCCTG & i507 & i703 & 24 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 27 | YPD | H99 | 9 | RZ | 10.0 | 3.57 | GAATTCGT | TCAGAGCC | i504 | i706 | 46 | \n",
"| 26 | YPD | H99 | 8 | MA | 10.0 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 | \n",
"| 36 | YPD | mar1d | 12 | MA | 9.7 | 3.70 | CGCTCATT | ACGTCCTG | i507 | i703 | 24 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 27 YPD H99 9 RZ 10.0\n",
"2 26 YPD H99 8 MA 10.0\n",
"3 36 YPD mar1d 12 MA 9.7\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 3.57 GAATTCGT TCAGAGCC i504 i706 46 \n",
"2 2.76 ATTACTCG GTCAGTAC i508 i701 8 \n",
"3 3.70 CGCTCATT ACGTCCTG i507 i703 24 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sample_n(df, 3)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
" RNA_sample_num Media Strain Replicate \n",
" Min. : 1.00 Length:51 Length:51 Min. : 1.000 \n",
" 1st Qu.: 9.50 Class :character Class :character 1st Qu.: 3.000 \n",
" Median :16.00 Mode :character Mode :character Median : 4.000 \n",
" Mean :19.55 Mean : 5.431 \n",
" 3rd Qu.:27.00 3rd Qu.: 8.000 \n",
" Max. :47.00 Max. :12.000 \n",
" enrichment_method RIN concentration_fold_difference\n",
" Length:51 Min. : 6.200 Min. :1.340 \n",
" Class :character 1st Qu.: 9.900 1st Qu.:2.010 \n",
" Mode :character Median :10.000 Median :2.850 \n",
" Mean : 9.776 Mean :2.892 \n",
" 3rd Qu.:10.000 3rd Qu.:3.640 \n",
" Max. :10.000 Max. :5.530 \n",
" i7_index i5_index i5_primer i7_primer \n",
" Length:51 Length:51 Length:51 Length:51 \n",
" Class :character Class :character Class :character Class :character \n",
" Mode :character Mode :character Mode :character Mode :character \n",
" \n",
" \n",
" \n",
" library# \n",
" Min. : 1.0 \n",
" 1st Qu.:13.5 \n",
" Median :26.0 \n",
" Mean :26.0 \n",
" 3rd Qu.:38.5 \n",
" Max. :51.0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"summary(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fixing fake numeric columns\n",
"\n",
"Note that `RNA_sample_num`, `Replicate` and `library#` are really discrete `factors` rather than numbrs. As `#` is not allowed in an R variable name, we need to use backticks for `library#`. (Alternatively, we cna reame to something like `library_num`). "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 'RNA_sample_num'
\n",
"\t- 'Replicate'
\n",
"\t- 'library#'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'RNA\\_sample\\_num'\n",
"\\item 'Replicate'\n",
"\\item 'library\\#'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'RNA_sample_num'\n",
"2. 'Replicate'\n",
"3. 'library#'\n",
"\n",
"\n"
],
"text/plain": [
"[1] \"RNA_sample_num\" \"Replicate\" \"library#\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% mutate(\n",
" RNA_sample_num=factor(RNA_sample_num), \n",
" Replicate=factor(Replicate),\n",
" `library#`=factor(`library#`)\n",
") %>% \n",
"select_if(is.factor) %>%\n",
"names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After checking that the transformation worked, we can save the transformed data.frame."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"df <- df %>% mutate(\n",
" RNA_sample_num=factor(RNA_sample_num), \n",
" Replicate=factor(Replicate),\n",
" `library#`=factor(`library#`)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Pipe"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t26 | YPD | H99 | 8 | MA | 10 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 |
\n",
"\t2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 |
\n",
"\t9 | YPD | mar1d | 3 | RZ | 10 | 2.23 | TCCGGAGA | GCCTCTAT | i502 | i702 | 10 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 26 & YPD & H99 & 8 & MA & 10 & 2.76 & ATTACTCG & GTCAGTAC & i508 & i701 & 8 \\\\\n",
"\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n",
"\t 9 & YPD & mar1d & 3 & RZ & 10 & 2.23 & TCCGGAGA & GCCTCTAT & i502 & i702 & 10 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 26 | YPD | H99 | 8 | MA | 10 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 | \n",
"| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n",
"| 9 | YPD | mar1d | 3 | RZ | 10 | 2.23 | TCCGGAGA | GCCTCTAT | i502 | i702 | 10 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN\n",
"1 26 YPD H99 8 MA 10 \n",
"2 2 YPD H99 2 RZ 10 \n",
"3 9 YPD mar1d 3 RZ 10 \n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 2.76 ATTACTCG GTCAGTAC i508 i701 8 \n",
"2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n",
"3 2.23 TCCGGAGA GCCTCTAT i502 i702 10 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% head(10) %>% tail(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the above result can also be achieved with `slice`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t26 | YPD | H99 | 8 | MA | 10 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 |
\n",
"\t2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 |
\n",
"\t9 | YPD | mar1d | 3 | RZ | 10 | 2.23 | TCCGGAGA | GCCTCTAT | i502 | i702 | 10 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 26 & YPD & H99 & 8 & MA & 10 & 2.76 & ATTACTCG & GTCAGTAC & i508 & i701 & 8 \\\\\n",
"\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n",
"\t 9 & YPD & mar1d & 3 & RZ & 10 & 2.23 & TCCGGAGA & GCCTCTAT & i502 & i702 & 10 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 26 | YPD | H99 | 8 | MA | 10 | 2.76 | ATTACTCG | GTCAGTAC | i508 | i701 | 8 | \n",
"| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n",
"| 9 | YPD | mar1d | 3 | RZ | 10 | 2.23 | TCCGGAGA | GCCTCTAT | i502 | i702 | 10 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN\n",
"1 26 YPD H99 8 MA 10 \n",
"2 2 YPD H99 2 RZ 10 \n",
"3 9 YPD mar1d 3 RZ 10 \n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 2.76 ATTACTCG GTCAGTAC i508 i701 8 \n",
"2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n",
"3 2.23 TCCGGAGA GCCTCTAT i502 i702 10 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% slice(8:10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Select columns"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain |
\n",
"\n",
"\t2 | YPD | H99 |
\n",
"\t9 | YPD | mar1d |
\n",
"\t10 | YPD | mar1d |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" RNA\\_sample\\_num & Media & Strain\\\\\n",
"\\hline\n",
"\t 2 & YPD & H99 \\\\\n",
"\t 9 & YPD & mar1d\\\\\n",
"\t 10 & YPD & mar1d\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | \n",
"|---|---|---|\n",
"| 2 | YPD | H99 | \n",
"| 9 | YPD | mar1d | \n",
"| 10 | YPD | mar1d | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain\n",
"1 2 YPD H99 \n",
"2 9 YPD mar1d \n",
"3 10 YPD mar1d "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(1:3) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Strain | enrichment_method |
\n",
"\n",
"\t2 | H99 | MA |
\n",
"\t9 | mar1d | MA |
\n",
"\t10 | mar1d | MA |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" RNA\\_sample\\_num & Strain & enrichment\\_method\\\\\n",
"\\hline\n",
"\t 2 & H99 & MA \\\\\n",
"\t 9 & mar1d & MA \\\\\n",
"\t 10 & mar1d & MA \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Strain | enrichment_method | \n",
"|---|---|---|\n",
"| 2 | H99 | MA | \n",
"| 9 | mar1d | MA | \n",
"| 10 | mar1d | MA | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Strain enrichment_method\n",
"1 2 H99 MA \n",
"2 9 mar1d MA \n",
"3 10 mar1d MA "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(c(1,3,5)) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media |
\n",
"\n",
"\t2 | YPD |
\n",
"\t9 | YPD |
\n",
"\t10 | YPD |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" RNA\\_sample\\_num & Media\\\\\n",
"\\hline\n",
"\t 2 & YPD\\\\\n",
"\t 9 & YPD\\\\\n",
"\t 10 & YPD\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | \n",
"|---|---|---|\n",
"| 2 | YPD | \n",
"| 9 | YPD | \n",
"| 10 | YPD | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media\n",
"1 2 YPD \n",
"2 9 YPD \n",
"3 10 YPD "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(c('RNA_sample_num', 'Media')) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dropping columns"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"i7_primer | library# |
\n",
"\n",
"\ti701 | 1 |
\n",
"\ti701 | 2 |
\n",
"\ti701 | 3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t i701 & 1 \\\\\n",
"\t i701 & 2 \\\\\n",
"\t i701 & 3 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"i7_primer | library# | \n",
"|---|---|---|\n",
"| i701 | 1 | \n",
"| i701 | 2 | \n",
"| i701 | 3 | \n",
"\n",
"\n"
],
"text/plain": [
" i7_primer library#\n",
"1 i701 1 \n",
"2 i701 2 \n",
"3 i701 3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(-((1:10))) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting using string operations"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/latex": [],
"text/markdown": [],
"text/plain": [
" \n",
"1\n",
"2\n",
"3"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(ends_with('person')) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"i7_index | i5_index | i5_primer | i7_primer |
\n",
"\n",
"\tATTACTCG | AGGCTATA | i501 | i701 |
\n",
"\tATTACTCG | GCCTCTAT | i502 | i701 |
\n",
"\tATTACTCG | AGGATAGG | i503 | i701 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" i7\\_index & i5\\_index & i5\\_primer & i7\\_primer\\\\\n",
"\\hline\n",
"\t ATTACTCG & AGGCTATA & i501 & i701 \\\\\n",
"\t ATTACTCG & GCCTCTAT & i502 & i701 \\\\\n",
"\t ATTACTCG & AGGATAGG & i503 & i701 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"i7_index | i5_index | i5_primer | i7_primer | \n",
"|---|---|---|\n",
"| ATTACTCG | AGGCTATA | i501 | i701 | \n",
"| ATTACTCG | GCCTCTAT | i502 | i701 | \n",
"| ATTACTCG | AGGATAGG | i503 | i701 | \n",
"\n",
"\n"
],
"text/plain": [
" i7_index i5_index i5_primer i7_primer\n",
"1 ATTACTCG AGGCTATA i501 i701 \n",
"2 ATTACTCG GCCTCTAT i502 i701 \n",
"3 ATTACTCG AGGATAGG i503 i701 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(starts_with('i')) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"i5_primer | i7_primer |
\n",
"\n",
"\ti501 | i701 |
\n",
"\ti502 | i701 |
\n",
"\ti503 | i701 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" i5\\_primer & i7\\_primer\\\\\n",
"\\hline\n",
"\t i501 & i701\\\\\n",
"\t i502 & i701\\\\\n",
"\t i503 & i701\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"i5_primer | i7_primer | \n",
"|---|---|---|\n",
"| i501 | i701 | \n",
"| i502 | i701 | \n",
"| i503 | i701 | \n",
"\n",
"\n"
],
"text/plain": [
" i5_primer i7_primer\n",
"1 i501 i701 \n",
"2 i502 i701 \n",
"3 i503 i701 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(contains('primer')) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | enrichment_method | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer |
\n",
"\n",
"\t2 | MA | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 |
\n",
"\t9 | MA | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 |
\n",
"\t10 | MA | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" RNA\\_sample\\_num & enrichment\\_method & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer\\\\\n",
"\\hline\n",
"\t 2 & MA & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 \\\\\n",
"\t 9 & MA & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 \\\\\n",
"\t 10 & MA & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | enrichment_method | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | \n",
"|---|---|---|\n",
"| 2 | MA | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | \n",
"| 9 | MA | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | \n",
"| 10 | MA | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num enrichment_method concentration_fold_difference i7_index\n",
"1 2 MA 1.34 ATTACTCG\n",
"2 9 MA 2.23 ATTACTCG\n",
"3 10 MA 4.37 ATTACTCG\n",
" i5_index i5_primer i7_primer\n",
"1 AGGCTATA i501 i701 \n",
"2 GCCTCTAT i502 i701 \n",
"3 AGGATAGG i503 i701 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select(matches('.*_.*')) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Renaming columns with select"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"method | fold.change |
\n",
"\n",
"\tMA | 1.34 |
\n",
"\tMA | 2.23 |
\n",
"\tMA | 4.37 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" method & fold.change\\\\\n",
"\\hline\n",
"\t MA & 1.34\\\\\n",
"\t MA & 2.23\\\\\n",
"\t MA & 4.37\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"method | fold.change | \n",
"|---|---|---|\n",
"| MA | 1.34 | \n",
"| MA | 2.23 | \n",
"| MA | 4.37 | \n",
"\n",
"\n"
],
"text/plain": [
" method fold.change\n",
"1 MA 1.34 \n",
"2 MA 2.23 \n",
"3 MA 4.37 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"select(c('method' = 'enrichment_method', \n",
" 'fold.change' = 'concentration_fold_difference')) %>% \n",
"head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scoped variants"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RIN | concentration_fold_difference |
\n",
"\n",
"\t10.0 | 1.34 |
\n",
"\t10.0 | 2.23 |
\n",
"\t 9.9 | 4.37 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" RIN & concentration\\_fold\\_difference\\\\\n",
"\\hline\n",
"\t 10.0 & 1.34\\\\\n",
"\t 10.0 & 2.23\\\\\n",
"\t 9.9 & 4.37\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RIN | concentration_fold_difference | \n",
"|---|---|---|\n",
"| 10.0 | 1.34 | \n",
"| 10.0 | 2.23 | \n",
"| 9.9 | 4.37 | \n",
"\n",
"\n"
],
"text/plain": [
" RIN concentration_fold_difference\n",
"1 10.0 1.34 \n",
"2 10.0 2.23 \n",
"3 9.9 4.37 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select_if(is.numeric) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"ENRICHMENT_METHOD | CONCENTRATION_FOLD_DIFFERENCE |
\n",
"\n",
"\tMA | 1.34 |
\n",
"\tMA | 2.23 |
\n",
"\tMA | 4.37 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" ENRICHMENT\\_METHOD & CONCENTRATION\\_FOLD\\_DIFFERENCE\\\\\n",
"\\hline\n",
"\t MA & 1.34\\\\\n",
"\t MA & 2.23\\\\\n",
"\t MA & 4.37\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"ENRICHMENT_METHOD | CONCENTRATION_FOLD_DIFFERENCE | \n",
"|---|---|---|\n",
"| MA | 1.34 | \n",
"| MA | 2.23 | \n",
"| MA | 4.37 | \n",
"\n",
"\n"
],
"text/plain": [
" ENRICHMENT_METHOD CONCENTRATION_FOLD_DIFFERENCE\n",
"1 MA 1.34 \n",
"2 MA 2.23 \n",
"3 MA 4.37 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"select_at(c('enrichment_method', \n",
" 'concentration_fold_difference'), toupper) %>%\n",
"head(3)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"rna_sample_num | media | strain | replicate | enrichment_method | rin | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 |
\n",
"\t9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 |
\n",
"\t10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" rna\\_sample\\_num & media & strain & replicate & enrichment\\_method & rin & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 2 & YPD & H99 & 2 & MA & 10.0 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n",
"\t 9 & YPD & mar1d & 3 & MA & 10.0 & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 & 2 \\\\\n",
"\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"rna_sample_num | media | strain | replicate | enrichment_method | rin | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n",
"| 9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 | \n",
"| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n",
"\n",
"\n"
],
"text/plain": [
" rna_sample_num media strain replicate enrichment_method rin \n",
"1 2 YPD H99 2 MA 10.0\n",
"2 9 YPD mar1d 3 MA 10.0\n",
"3 10 YPD mar1d 4 MA 9.9\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n",
"2 2.23 ATTACTCG GCCTCTAT i502 i701 2 \n",
"3 4.37 ATTACTCG AGGATAGG i503 i701 3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% select_all(tolower) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Filter rows"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 'YPD'
\n",
"\t- 'TC'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'YPD'\n",
"\\item 'TC'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'YPD'\n",
"2. 'TC'\n",
"\n",
"\n"
],
"text/plain": [
"[1] \"YPD\" \"TC\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"unique(df$Media)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Equality and inequality conditions"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t14 | TC | H99 | 2 | MA | 10.0 | 1.57 | ATTACTCG | TCAGAGCC | i504 | i701 | 4 |
\n",
"\t15 | TC | H99 | 3 | MA | 9.9 | 2.85 | ATTACTCG | CTTCGCCT | i505 | i701 | 5 |
\n",
"\t21 | TC | mar1d | 3 | MA | 10.0 | 1.81 | ATTACTCG | TAAGATTA | i506 | i701 | 6 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 14 & TC & H99 & 2 & MA & 10.0 & 1.57 & ATTACTCG & TCAGAGCC & i504 & i701 & 4 \\\\\n",
"\t 15 & TC & H99 & 3 & MA & 9.9 & 2.85 & ATTACTCG & CTTCGCCT & i505 & i701 & 5 \\\\\n",
"\t 21 & TC & mar1d & 3 & MA & 10.0 & 1.81 & ATTACTCG & TAAGATTA & i506 & i701 & 6 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 14 | TC | H99 | 2 | MA | 10.0 | 1.57 | ATTACTCG | TCAGAGCC | i504 | i701 | 4 | \n",
"| 15 | TC | H99 | 3 | MA | 9.9 | 2.85 | ATTACTCG | CTTCGCCT | i505 | i701 | 5 | \n",
"| 21 | TC | mar1d | 3 | MA | 10.0 | 1.81 | ATTACTCG | TAAGATTA | i506 | i701 | 6 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 14 TC H99 2 MA 10.0\n",
"2 15 TC H99 3 MA 9.9\n",
"3 21 TC mar1d 3 MA 10.0\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 1.57 ATTACTCG TCAGAGCC i504 i701 4 \n",
"2 2.85 ATTACTCG CTTCGCCT i505 i701 5 \n",
"3 1.81 ATTACTCG TAAGATTA i506 i701 6 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% filter(Media == 'TC') %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 |
\n",
"\t10 | YPD | mar1d | 4 | RZ | 9.9 | 4.37 | TCCGGAGA | AGGATAGG | i503 | i702 | 11 |
\n",
"\t1 | YPD | H99 | 1 | MA | 10.0 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n",
"\t 10 & YPD & mar1d & 4 & RZ & 9.9 & 4.37 & TCCGGAGA & AGGATAGG & i503 & i702 & 11 \\\\\n",
"\t 1 & YPD & H99 & 1 & MA & 10.0 & 3.64 & CGCTCATT & AGGCTATA & i501 & i703 & 18 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n",
"| 10 | YPD | mar1d | 4 | RZ | 9.9 | 4.37 | TCCGGAGA | AGGATAGG | i503 | i702 | 11 | \n",
"| 1 | YPD | H99 | 1 | MA | 10.0 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 10 YPD mar1d 4 MA 9.9\n",
"2 10 YPD mar1d 4 RZ 9.9\n",
"3 1 YPD H99 1 MA 10.0\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 4.37 ATTACTCG AGGATAGG i503 i701 3 \n",
"2 4.37 TCCGGAGA AGGATAGG i503 i702 11 \n",
"3 3.64 CGCTCATT AGGCTATA i501 i703 18 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% filter(concentration_fold_difference > 3) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Combining conditions"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 |
\n",
"\t10 | YPD | mar1d | 4 | RZ | 9.9 | 4.37 | TCCGGAGA | AGGATAGG | i503 | i702 | 11 |
\n",
"\t1 | YPD | H99 | 1 | MA | 10.0 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n",
"\t 10 & YPD & mar1d & 4 & RZ & 9.9 & 4.37 & TCCGGAGA & AGGATAGG & i503 & i702 & 11 \\\\\n",
"\t 1 & YPD & H99 & 1 & MA & 10.0 & 3.64 & CGCTCATT & AGGCTATA & i501 & i703 & 18 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n",
"| 10 | YPD | mar1d | 4 | RZ | 9.9 | 4.37 | TCCGGAGA | AGGATAGG | i503 | i702 | 11 | \n",
"| 1 | YPD | H99 | 1 | MA | 10.0 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 10 YPD mar1d 4 MA 9.9\n",
"2 10 YPD mar1d 4 RZ 9.9\n",
"3 1 YPD H99 1 MA 10.0\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 4.37 ATTACTCG AGGATAGG i503 i701 3 \n",
"2 4.37 TCCGGAGA AGGATAGG i503 i702 11 \n",
"3 3.64 CGCTCATT AGGCTATA i501 i703 18 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"filter(Media != 'TC', \n",
" concentration_fold_difference > 3) %>%\n",
"head(3)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 |
\n",
"\t14 | TC | H99 | 2 | MA | 10.0 | 1.57 | ATTACTCG | TCAGAGCC | i504 | i701 | 4 |
\n",
"\t15 | TC | H99 | 3 | MA | 9.9 | 2.85 | ATTACTCG | CTTCGCCT | i505 | i701 | 5 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n",
"\t 14 & TC & H99 & 2 & MA & 10.0 & 1.57 & ATTACTCG & TCAGAGCC & i504 & i701 & 4 \\\\\n",
"\t 15 & TC & H99 & 3 & MA & 9.9 & 2.85 & ATTACTCG & CTTCGCCT & i505 & i701 & 5 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n",
"| 14 | TC | H99 | 2 | MA | 10.0 | 1.57 | ATTACTCG | TCAGAGCC | i504 | i701 | 4 | \n",
"| 15 | TC | H99 | 3 | MA | 9.9 | 2.85 | ATTACTCG | CTTCGCCT | i505 | i701 | 5 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 10 YPD mar1d 4 MA 9.9\n",
"2 14 TC H99 2 MA 10.0\n",
"3 15 TC H99 3 MA 9.9\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 4.37 ATTACTCG AGGATAGG i503 i701 3 \n",
"2 1.57 ATTACTCG TCAGAGCC i504 i701 4 \n",
"3 2.85 ATTACTCG CTTCGCCT i505 i701 5 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"filter(Media == 'TC' |\n",
" concentration_fold_difference > 3) %>%\n",
"head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filtering on string conditions"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 |
\n",
"\t9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 |
\n",
"\t10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 2 & YPD & H99 & 2 & MA & 10.0 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n",
"\t 9 & YPD & mar1d & 3 & MA & 10.0 & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 & 2 \\\\\n",
"\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n",
"| 9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 | \n",
"| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 2 YPD H99 2 MA 10.0\n",
"2 9 YPD mar1d 3 MA 10.0\n",
"3 10 YPD mar1d 4 MA 9.9\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n",
"2 2.23 ATTACTCG GCCTCTAT i502 i701 2 \n",
"3 4.37 ATTACTCG AGGATAGG i503 i701 3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% filter(str_length(Media) == 3) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 |
\n",
"\t9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 |
\n",
"\t10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 2 & YPD & H99 & 2 & MA & 10.0 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n",
"\t 9 & YPD & mar1d & 3 & MA & 10.0 & 2.23 & ATTACTCG & GCCTCTAT & i502 & i701 & 2 \\\\\n",
"\t 10 & YPD & mar1d & 4 & MA & 9.9 & 4.37 & ATTACTCG & AGGATAGG & i503 & i701 & 3 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 2 | YPD | H99 | 2 | MA | 10.0 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n",
"| 9 | YPD | mar1d | 3 | MA | 10.0 | 2.23 | ATTACTCG | GCCTCTAT | i502 | i701 | 2 | \n",
"| 10 | YPD | mar1d | 4 | MA | 9.9 | 4.37 | ATTACTCG | AGGATAGG | i503 | i701 | 3 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 2 YPD H99 2 MA 10.0\n",
"2 9 YPD mar1d 3 MA 10.0\n",
"3 10 YPD mar1d 4 MA 9.9\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n",
"2 2.23 ATTACTCG GCCTCTAT i502 i701 2 \n",
"3 4.37 ATTACTCG AGGATAGG i503 i701 3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% filter(str_detect(i7_index, '^A.+')) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Arrange in ascending or descening order"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t2 | YPD | H99 | 2 | MA | 10 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 |
\n",
"\t2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 |
\n",
"\t2 | YPD | H99 | 2 | TOT | 10 | 1.34 | CTGAAGCT | AGGCTATA | i501 | i707 | 17 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 2 & YPD & H99 & 2 & MA & 10 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n",
"\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n",
"\t 2 & YPD & H99 & 2 & TOT & 10 & 1.34 & CTGAAGCT & AGGCTATA & i501 & i707 & 17 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 2 | YPD | H99 | 2 | MA | 10 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n",
"| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n",
"| 2 | YPD | H99 | 2 | TOT | 10 | 1.34 | CTGAAGCT | AGGCTATA | i501 | i707 | 17 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN\n",
"1 2 YPD H99 2 MA 10 \n",
"2 2 YPD H99 2 RZ 10 \n",
"3 2 YPD H99 2 TOT 10 \n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n",
"2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n",
"3 1.34 CTGAAGCT AGGCTATA i501 i707 17 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% arrange(concentration_fold_difference) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t24 | TC | mar1d | 6 | MA | 10.0 | 5.53 | CGCTCATT | TAAGATTA | i506 | i703 | 23 |
\n",
"\t24 | TC | mar1d | 6 | RZ | 10.0 | 5.53 | GAGATTCC | TAAGATTA | i506 | i704 | 31 |
\n",
"\t23 | TC | mar1d | 5 | MA | 9.9 | 4.47 | CGCTCATT | CTTCGCCT | i505 | i703 | 22 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 24 & TC & mar1d & 6 & MA & 10.0 & 5.53 & CGCTCATT & TAAGATTA & i506 & i703 & 23 \\\\\n",
"\t 24 & TC & mar1d & 6 & RZ & 10.0 & 5.53 & GAGATTCC & TAAGATTA & i506 & i704 & 31 \\\\\n",
"\t 23 & TC & mar1d & 5 & MA & 9.9 & 4.47 & CGCTCATT & CTTCGCCT & i505 & i703 & 22 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 24 | TC | mar1d | 6 | MA | 10.0 | 5.53 | CGCTCATT | TAAGATTA | i506 | i703 | 23 | \n",
"| 24 | TC | mar1d | 6 | RZ | 10.0 | 5.53 | GAGATTCC | TAAGATTA | i506 | i704 | 31 | \n",
"| 23 | TC | mar1d | 5 | MA | 9.9 | 4.47 | CGCTCATT | CTTCGCCT | i505 | i703 | 22 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 24 TC mar1d 6 MA 10.0\n",
"2 24 TC mar1d 6 RZ 10.0\n",
"3 23 TC mar1d 5 MA 9.9\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 5.53 CGCTCATT TAAGATTA i506 i703 23 \n",
"2 5.53 GAGATTCC TAAGATTA i506 i704 31 \n",
"3 4.47 CGCTCATT CTTCGCCT i505 i703 22 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% arrange(desc(concentration_fold_difference)) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t1 | YPD | H99 | 1 | MA | 10 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 |
\n",
"\t1 | YPD | H99 | 1 | RZ | 10 | 3.64 | GAGATTCC | AGGCTATA | i501 | i704 | 26 |
\n",
"\t13 | TC | H99 | 1 | MA | 10 | 1.95 | CGCTCATT | TCAGAGCC | i504 | i703 | 21 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 1 & YPD & H99 & 1 & MA & 10 & 3.64 & CGCTCATT & AGGCTATA & i501 & i703 & 18 \\\\\n",
"\t 1 & YPD & H99 & 1 & RZ & 10 & 3.64 & GAGATTCC & AGGCTATA & i501 & i704 & 26 \\\\\n",
"\t 13 & TC & H99 & 1 & MA & 10 & 1.95 & CGCTCATT & TCAGAGCC & i504 & i703 & 21 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 1 | YPD | H99 | 1 | MA | 10 | 3.64 | CGCTCATT | AGGCTATA | i501 | i703 | 18 | \n",
"| 1 | YPD | H99 | 1 | RZ | 10 | 3.64 | GAGATTCC | AGGCTATA | i501 | i704 | 26 | \n",
"| 13 | TC | H99 | 1 | MA | 10 | 1.95 | CGCTCATT | TCAGAGCC | i504 | i703 | 21 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN\n",
"1 1 YPD H99 1 MA 10 \n",
"2 1 YPD H99 1 RZ 10 \n",
"3 13 TC H99 1 MA 10 \n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 3.64 CGCTCATT AGGCTATA i501 i703 18 \n",
"2 3.64 GAGATTCC AGGCTATA i501 i704 26 \n",
"3 1.95 CGCTCATT TCAGAGCC i504 i703 21 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"arrange(Replicate, \n",
" desc(concentration_fold_difference))%>% \n",
"head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using `top_n`"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t23 | TC | mar1d | 5 | MA | 9.9 | 4.47 | CGCTCATT | CTTCGCCT | i505 | i703 | 22 |
\n",
"\t24 | TC | mar1d | 6 | MA | 10.0 | 5.53 | CGCTCATT | TAAGATTA | i506 | i703 | 23 |
\n",
"\t23 | TC | mar1d | 5 | RZ | 9.9 | 4.47 | GAGATTCC | CTTCGCCT | i505 | i704 | 30 |
\n",
"\t24 | TC | mar1d | 6 | RZ | 10.0 | 5.53 | GAGATTCC | TAAGATTA | i506 | i704 | 31 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 23 & TC & mar1d & 5 & MA & 9.9 & 4.47 & CGCTCATT & CTTCGCCT & i505 & i703 & 22 \\\\\n",
"\t 24 & TC & mar1d & 6 & MA & 10.0 & 5.53 & CGCTCATT & TAAGATTA & i506 & i703 & 23 \\\\\n",
"\t 23 & TC & mar1d & 5 & RZ & 9.9 & 4.47 & GAGATTCC & CTTCGCCT & i505 & i704 & 30 \\\\\n",
"\t 24 & TC & mar1d & 6 & RZ & 10.0 & 5.53 & GAGATTCC & TAAGATTA & i506 & i704 & 31 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|---|\n",
"| 23 | TC | mar1d | 5 | MA | 9.9 | 4.47 | CGCTCATT | CTTCGCCT | i505 | i703 | 22 | \n",
"| 24 | TC | mar1d | 6 | MA | 10.0 | 5.53 | CGCTCATT | TAAGATTA | i506 | i703 | 23 | \n",
"| 23 | TC | mar1d | 5 | RZ | 9.9 | 4.47 | GAGATTCC | CTTCGCCT | i505 | i704 | 30 | \n",
"| 24 | TC | mar1d | 6 | RZ | 10.0 | 5.53 | GAGATTCC | TAAGATTA | i506 | i704 | 31 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN \n",
"1 23 TC mar1d 5 MA 9.9\n",
"2 24 TC mar1d 6 MA 10.0\n",
"3 23 TC mar1d 5 RZ 9.9\n",
"4 24 TC mar1d 6 RZ 10.0\n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 4.47 CGCTCATT CTTCGCCT i505 i703 22 \n",
"2 5.53 CGCTCATT TAAGATTA i506 i703 23 \n",
"3 4.47 GAGATTCC CTTCGCCT i505 i704 30 \n",
"4 5.53 GAGATTCC TAAGATTA i506 i704 31 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% top_n(3, concentration_fold_difference)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# |
\n",
"\n",
"\t2 | YPD | H99 | 2 | MA | 10 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 |
\n",
"\t2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 |
\n",
"\t2 | YPD | H99 | 2 | TOT | 10 | 1.34 | CTGAAGCT | AGGCTATA | i501 | i707 | 17 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllllll}\n",
" RNA\\_sample\\_num & Media & Strain & Replicate & enrichment\\_method & RIN & concentration\\_fold\\_difference & i7\\_index & i5\\_index & i5\\_primer & i7\\_primer & library\\#\\\\\n",
"\\hline\n",
"\t 2 & YPD & H99 & 2 & MA & 10 & 1.34 & ATTACTCG & AGGCTATA & i501 & i701 & 1 \\\\\n",
"\t 2 & YPD & H99 & 2 & RZ & 10 & 1.34 & TCCGGAGA & AGGCTATA & i501 & i702 & 9 \\\\\n",
"\t 2 & YPD & H99 & 2 & TOT & 10 & 1.34 & CTGAAGCT & AGGCTATA & i501 & i707 & 17 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | Media | Strain | Replicate | enrichment_method | RIN | concentration_fold_difference | i7_index | i5_index | i5_primer | i7_primer | library# | \n",
"|---|---|---|\n",
"| 2 | YPD | H99 | 2 | MA | 10 | 1.34 | ATTACTCG | AGGCTATA | i501 | i701 | 1 | \n",
"| 2 | YPD | H99 | 2 | RZ | 10 | 1.34 | TCCGGAGA | AGGCTATA | i501 | i702 | 9 | \n",
"| 2 | YPD | H99 | 2 | TOT | 10 | 1.34 | CTGAAGCT | AGGCTATA | i501 | i707 | 17 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num Media Strain Replicate enrichment_method RIN\n",
"1 2 YPD H99 2 MA 10 \n",
"2 2 YPD H99 2 RZ 10 \n",
"3 2 YPD H99 2 TOT 10 \n",
" concentration_fold_difference i7_index i5_index i5_primer i7_primer library#\n",
"1 1.34 ATTACTCG AGGCTATA i501 i701 1 \n",
"2 1.34 TCCGGAGA AGGCTATA i501 i702 9 \n",
"3 1.34 CTGAAGCT AGGCTATA i501 i707 17 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% top_n(3, desc(concentration_fold_difference))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Mutate values"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RNA_sample_num | concentration_fold_difference | concentration_difference |
\n",
"\n",
"\t2 | 1.34 | 2.531513 |
\n",
"\t9 | 2.23 | 4.691340 |
\n",
"\t10 | 4.37 | 20.677645 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" RNA\\_sample\\_num & concentration\\_fold\\_difference & concentration\\_difference\\\\\n",
"\\hline\n",
"\t 2 & 1.34 & 2.531513\\\\\n",
"\t 9 & 2.23 & 4.691340\\\\\n",
"\t 10 & 4.37 & 20.677645\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RNA_sample_num | concentration_fold_difference | concentration_difference | \n",
"|---|---|---|\n",
"| 2 | 1.34 | 2.531513 | \n",
"| 9 | 2.23 | 4.691340 | \n",
"| 10 | 4.37 | 20.677645 | \n",
"\n",
"\n"
],
"text/plain": [
" RNA_sample_num concentration_fold_difference concentration_difference\n",
"1 2 1.34 2.531513 \n",
"2 9 2.23 4.691340 \n",
"3 10 4.37 20.677645 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"select(RNA_sample_num, concentration_fold_difference) %>%\n",
"mutate(concentration_difference=2^concentration_fold_difference) %>%\n",
"head(3)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"concentration_difference |
\n",
"\n",
"\t 2.531513 |
\n",
"\t 4.691340 |
\n",
"\t20.677645 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|l}\n",
" concentration\\_difference\\\\\n",
"\\hline\n",
"\t 2.531513\\\\\n",
"\t 4.691340\\\\\n",
"\t 20.677645\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"concentration_difference | \n",
"|---|---|---|\n",
"| 2.531513 | \n",
"| 4.691340 | \n",
"| 20.677645 | \n",
"\n",
"\n"
],
"text/plain": [
" concentration_difference\n",
"1 2.531513 \n",
"2 4.691340 \n",
"3 20.677645 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>%\n",
"transmute(concentration_difference=2^concentration_fold_difference) %>% \n",
"head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Summarize"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"avg_fold_change | min_fold_change | max_fold_change |
\n",
"\n",
"\t2.891961 | 1.34 | 5.53 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" avg\\_fold\\_change & min\\_fold\\_change & max\\_fold\\_change\\\\\n",
"\\hline\n",
"\t 2.891961 & 1.34 & 5.53 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"avg_fold_change | min_fold_change | max_fold_change | \n",
"|---|\n",
"| 2.891961 | 1.34 | 5.53 | \n",
"\n",
"\n"
],
"text/plain": [
" avg_fold_change min_fold_change max_fold_change\n",
"1 2.891961 1.34 5.53 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% summarize(avg_fold_change=mean(concentration_fold_difference),\n",
" min_fold_change=min(concentration_fold_difference),\n",
" max_fold_change=max(concentration_fold_difference))"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"RIN | concentration_fold_difference |
\n",
"\n",
"\t9.776471 | 2.891961 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" RIN & concentration\\_fold\\_difference\\\\\n",
"\\hline\n",
"\t 9.776471 & 2.891961\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"RIN | concentration_fold_difference | \n",
"|---|\n",
"| 9.776471 | 2.891961 | \n",
"\n",
"\n"
],
"text/plain": [
" RIN concentration_fold_difference\n",
"1 9.776471 2.891961 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% summarize_if(is.numeric, mean)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Group_by\n",
"\n",
"`summariz`e is most useful when used with `group_by`"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Media | Strain | enrichment_method | mean_fold_diff |
\n",
"\n",
"\tTC | H99 | MA | 2.576667 |
\n",
"\tTC | H99 | RZ | 2.576667 |
\n",
"\tTC | mar1d | MA | 3.463333 |
\n",
"\tTC | mar1d | RZ | 3.463333 |
\n",
"\tYPD | H99 | MA | 2.556667 |
\n",
"\tYPD | H99 | RZ | 2.556667 |
\n",
"\tYPD | H99 | TOT | 1.790000 |
\n",
"\tYPD | mar1d | MA | 3.246667 |
\n",
"\tYPD | mar1d | RZ | 3.246667 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" Media & Strain & enrichment\\_method & mean\\_fold\\_diff\\\\\n",
"\\hline\n",
"\t TC & H99 & MA & 2.576667\\\\\n",
"\t TC & H99 & RZ & 2.576667\\\\\n",
"\t TC & mar1d & MA & 3.463333\\\\\n",
"\t TC & mar1d & RZ & 3.463333\\\\\n",
"\t YPD & H99 & MA & 2.556667\\\\\n",
"\t YPD & H99 & RZ & 2.556667\\\\\n",
"\t YPD & H99 & TOT & 1.790000\\\\\n",
"\t YPD & mar1d & MA & 3.246667\\\\\n",
"\t YPD & mar1d & RZ & 3.246667\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Media | Strain | enrichment_method | mean_fold_diff | \n",
"|---|---|---|---|---|---|---|---|---|\n",
"| TC | H99 | MA | 2.576667 | \n",
"| TC | H99 | RZ | 2.576667 | \n",
"| TC | mar1d | MA | 3.463333 | \n",
"| TC | mar1d | RZ | 3.463333 | \n",
"| YPD | H99 | MA | 2.556667 | \n",
"| YPD | H99 | RZ | 2.556667 | \n",
"| YPD | H99 | TOT | 1.790000 | \n",
"| YPD | mar1d | MA | 3.246667 | \n",
"| YPD | mar1d | RZ | 3.246667 | \n",
"\n",
"\n"
],
"text/plain": [
" Media Strain enrichment_method mean_fold_diff\n",
"1 TC H99 MA 2.576667 \n",
"2 TC H99 RZ 2.576667 \n",
"3 TC mar1d MA 3.463333 \n",
"4 TC mar1d RZ 3.463333 \n",
"5 YPD H99 MA 2.556667 \n",
"6 YPD H99 RZ 2.556667 \n",
"7 YPD H99 TOT 1.790000 \n",
"8 YPD mar1d MA 3.246667 \n",
"9 YPD mar1d RZ 3.246667 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"group_by(Media, Strain, enrichment_method) %>%\n",
"summarize(mean_fold_diff=mean(concentration_fold_difference))"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Media | Strain | enrichment_method | RIN | concentration_fold_difference |
\n",
"\n",
"\tTC | H99 | MA | 9.850000 | 2.576667 |
\n",
"\tTC | H99 | RZ | 9.850000 | 2.576667 |
\n",
"\tTC | mar1d | MA | 9.333333 | 3.463333 |
\n",
"\tTC | mar1d | RZ | 9.333333 | 3.463333 |
\n",
"\tYPD | H99 | MA | 10.000000 | 2.556667 |
\n",
"\tYPD | H99 | RZ | 10.000000 | 2.556667 |
\n",
"\tYPD | H99 | TOT | 10.000000 | 1.790000 |
\n",
"\tYPD | mar1d | MA | 9.866667 | 3.246667 |
\n",
"\tYPD | mar1d | RZ | 9.866667 | 3.246667 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Media & Strain & enrichment\\_method & RIN & concentration\\_fold\\_difference\\\\\n",
"\\hline\n",
"\t TC & H99 & MA & 9.850000 & 2.576667 \\\\\n",
"\t TC & H99 & RZ & 9.850000 & 2.576667 \\\\\n",
"\t TC & mar1d & MA & 9.333333 & 3.463333 \\\\\n",
"\t TC & mar1d & RZ & 9.333333 & 3.463333 \\\\\n",
"\t YPD & H99 & MA & 10.000000 & 2.556667 \\\\\n",
"\t YPD & H99 & RZ & 10.000000 & 2.556667 \\\\\n",
"\t YPD & H99 & TOT & 10.000000 & 1.790000 \\\\\n",
"\t YPD & mar1d & MA & 9.866667 & 3.246667 \\\\\n",
"\t YPD & mar1d & RZ & 9.866667 & 3.246667 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Media | Strain | enrichment_method | RIN | concentration_fold_difference | \n",
"|---|---|---|---|---|---|---|---|---|\n",
"| TC | H99 | MA | 9.850000 | 2.576667 | \n",
"| TC | H99 | RZ | 9.850000 | 2.576667 | \n",
"| TC | mar1d | MA | 9.333333 | 3.463333 | \n",
"| TC | mar1d | RZ | 9.333333 | 3.463333 | \n",
"| YPD | H99 | MA | 10.000000 | 2.556667 | \n",
"| YPD | H99 | RZ | 10.000000 | 2.556667 | \n",
"| YPD | H99 | TOT | 10.000000 | 1.790000 | \n",
"| YPD | mar1d | MA | 9.866667 | 3.246667 | \n",
"| YPD | mar1d | RZ | 9.866667 | 3.246667 | \n",
"\n",
"\n"
],
"text/plain": [
" Media Strain enrichment_method RIN concentration_fold_difference\n",
"1 TC H99 MA 9.850000 2.576667 \n",
"2 TC H99 RZ 9.850000 2.576667 \n",
"3 TC mar1d MA 9.333333 3.463333 \n",
"4 TC mar1d RZ 9.333333 3.463333 \n",
"5 YPD H99 MA 10.000000 2.556667 \n",
"6 YPD H99 RZ 10.000000 2.556667 \n",
"7 YPD H99 TOT 10.000000 1.790000 \n",
"8 YPD mar1d MA 9.866667 3.246667 \n",
"9 YPD mar1d RZ 9.866667 3.246667 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"group_by(Media, Strain, enrichment_method) %>%\n",
"summarize_if(is.numeric, mean) "
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Media | Strain | RIN_mean | concentration_fold_difference_mean | RIN_sd | concentration_fold_difference_sd |
\n",
"\n",
"\tTC | H99 | 9.850000 | 2.576667 | 0.2611165 | 0.7157873 |
\n",
"\tTC | mar1d | 9.333333 | 3.463333 | 1.4643232 | 1.3630803 |
\n",
"\tYPD | H99 | 10.000000 | 2.403333 | 0.0000000 | 0.8593824 |
\n",
"\tYPD | mar1d | 9.866667 | 3.246667 | 0.1669694 | 0.7410230 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllll}\n",
" Media & Strain & RIN\\_mean & concentration\\_fold\\_difference\\_mean & RIN\\_sd & concentration\\_fold\\_difference\\_sd\\\\\n",
"\\hline\n",
"\t TC & H99 & 9.850000 & 2.576667 & 0.2611165 & 0.7157873\\\\\n",
"\t TC & mar1d & 9.333333 & 3.463333 & 1.4643232 & 1.3630803\\\\\n",
"\t YPD & H99 & 10.000000 & 2.403333 & 0.0000000 & 0.8593824\\\\\n",
"\t YPD & mar1d & 9.866667 & 3.246667 & 0.1669694 & 0.7410230\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Media | Strain | RIN_mean | concentration_fold_difference_mean | RIN_sd | concentration_fold_difference_sd | \n",
"|---|---|---|---|\n",
"| TC | H99 | 9.850000 | 2.576667 | 0.2611165 | 0.7157873 | \n",
"| TC | mar1d | 9.333333 | 3.463333 | 1.4643232 | 1.3630803 | \n",
"| YPD | H99 | 10.000000 | 2.403333 | 0.0000000 | 0.8593824 | \n",
"| YPD | mar1d | 9.866667 | 3.246667 | 0.1669694 | 0.7410230 | \n",
"\n",
"\n"
],
"text/plain": [
" Media Strain RIN_mean concentration_fold_difference_mean RIN_sd \n",
"1 TC H99 9.850000 2.576667 0.2611165\n",
"2 TC mar1d 9.333333 3.463333 1.4643232\n",
"3 YPD H99 10.000000 2.403333 0.0000000\n",
"4 YPD mar1d 9.866667 3.246667 0.1669694\n",
" concentration_fold_difference_sd\n",
"1 0.7157873 \n",
"2 1.3630803 \n",
"3 0.8593824 \n",
"4 0.7410230 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"group_by(Media, Strain) %>%\n",
"summarize_if(is.numeric, funs(mean, sd))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What experimental conditions produced the greatest mean fold change?"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Media | Strain | enrichment_method | mean_fold_diff |
\n",
"\n",
"\tTC | mar1d | MA | 3.463333 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" Media & Strain & enrichment\\_method & mean\\_fold\\_diff\\\\\n",
"\\hline\n",
"\t TC & mar1d & MA & 3.463333\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Media | Strain | enrichment_method | mean_fold_diff | \n",
"|---|\n",
"| TC | mar1d | MA | 3.463333 | \n",
"\n",
"\n"
],
"text/plain": [
" Media Strain enrichment_method mean_fold_diff\n",
"1 TC mar1d MA 3.463333 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df %>% \n",
"group_by(Media, Strain, enrichment_method) %>%\n",
"summarize(mean_fold_diff=mean(concentration_fold_difference)) %>%\n",
"arrange(desc(mean_fold_diff)) %>%\n",
"head(1)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "r"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.4.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}