{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# About the notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**This is the setup of the pilot count analysis. In this notebook, we will create the folder for output and check the files we need.**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "set -u" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# About the purpose of the analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**There are several things we would go through in this series of tutorial:**\n", "1. create the count matrix from the STAR output\n", "1. use DESeq package to normalize the count matrix\n", "1. pathway analysis " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create directories\n", "\n", "Below is an illustration of our folder structure.\n", "```\n", "scratch\n", "└── bioinf_intro\n", "└── analysis_output\n", " ├── out -> the folder to store all our output data files\n", " └── img -> the folder to store all our images \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "set the directories of each folder" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "CURDIR=\"$HOME/work/scratch/analysis_output\"\n", "OUTDIR=\"${CURDIR}/out\"\n", "IMGDIR=\"${CURDIR}/img\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "create the directories" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "mkdir -p $OUTDIR\n", "mkdir -p $IMGDIR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "check if the directories are created correctly" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0m\u001b[01;34mimg\u001b[0m \u001b[01;34mout\u001b[0m\n" ] } ], "source": [ "ls $CURDIR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Check all the files we need" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The files we need throughout this series of tutorial is count matrix and metadata, which you have been using during Cliburn's lecture" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "DATDIR=\"/data/hts2018_pilot/star_counts\"\n", "METADTFILE=\"$HOME/work/HTS2018-notebooks/josh/info/2018_pilot_metadata_anon.tsv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The metadata of the samples in pilot data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Label\tRNA_sample_num\tMedia\tStrain\tReplicate\texperiment_person\tlibprep_person\tenrichment_method\tRIN\tconcentration_fold_difference\ti7 index\ti5 index\ti5 primer\ti7 primer\tlibrary#\n", "2_MA_C\t2\tYPD\tH99\t2\texpA\tprepA\tMA\t10\t1.34\tATTACTCG\tAGGCTATA\ti501\ti701\t1\n", "9_MA_C\t9\tYPD\tmar1d\t3\texpA\tprepA\tMA\t10\t2.23\tATTACTCG\tGCCTCTAT\ti502\ti701\t2\n", "10_MA_C\t10\tYPD\tmar1d\t4\texpA\tprepA\tMA\t9.9\t4.37\tATTACTCG\tAGGATAGG\ti503\ti701\t3\n", "14_MA_C\t14\tTC\tH99\t2\texpA\tprepA\tMA\t10\t1.57\tATTACTCG\tTCAGAGCC\ti504\ti701\t4\n", "15_MA_C\t15\tTC\tH99\t3\texpA\tprepA\tMA\t9.9\t2.85\tATTACTCG\tCTTCGCCT\ti505\ti701\t5\n", "21_MA_C\t21\tTC\tmar1d\t3\texpA\tprepA\tMA\t10\t1.81\tATTACTCG\tTAAGATTA\ti506\ti701\t6\n", "22_MA_C\t22\tTC\tmar1d\t4\texpA\tprepA\tMA\t9.9\t2.01\tATTACTCG\tACGTCCTG\ti507\ti701\t7\n", "26_MA_C\t26\tYPD\tH99\t8\texpB\tprepA\tMA\t10\t2.76\tATTACTCG\tGTCAGTAC\ti508\ti701\t8\n", "2_RZ_C\t2\tYPD\tH99\t2\texpA\tprepA\tRZ\t10\t1.34\tTCCGGAGA\tAGGCTATA\ti501\ti702\t9\n" ] } ], "source": [ "head $METADTFILE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The count results output from STAR alignment, which was explained in Josh's lecture." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "204\n" ] } ], "source": [ "# there should be 204 samples in this directory\n", "ls $DATDIR | wc -l" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10_MA_C_S3_L001_ReadsPerGene.out.tab 26_RZ_C_S16_L003_ReadsPerGene.out.tab\n", "10_MA_C_S3_L002_ReadsPerGene.out.tab 26_RZ_C_S16_L004_ReadsPerGene.out.tab\n", "10_MA_C_S3_L003_ReadsPerGene.out.tab 27_MA_P_S38_L001_ReadsPerGene.out.tab\n", "10_MA_C_S3_L004_ReadsPerGene.out.tab 27_MA_P_S38_L002_ReadsPerGene.out.tab\n", "10_RZ_C_S11_L001_ReadsPerGene.out.tab 27_MA_P_S38_L003_ReadsPerGene.out.tab\n", "10_RZ_C_S11_L002_ReadsPerGene.out.tab 27_MA_P_S38_L004_ReadsPerGene.out.tab\n", "10_RZ_C_S11_L003_ReadsPerGene.out.tab 27_RZ_P_S46_L001_ReadsPerGene.out.tab\n", "10_RZ_C_S11_L004_ReadsPerGene.out.tab 27_RZ_P_S46_L002_ReadsPerGene.out.tab\n", "11_MA_J_S20_L001_ReadsPerGene.out.tab 27_RZ_P_S46_L003_ReadsPerGene.out.tab\n", "11_MA_J_S20_L002_ReadsPerGene.out.tab 27_RZ_P_S46_L004_ReadsPerGene.out.tab\n", "11_MA_J_S20_L003_ReadsPerGene.out.tab 2_MA_C_S1_L001_ReadsPerGene.out.tab\n", "11_MA_J_S20_L004_ReadsPerGene.out.tab 2_MA_C_S1_L002_ReadsPerGene.out.tab\n", "11_RZ_J_S28_L001_ReadsPerGene.out.tab 2_MA_C_S1_L003_ReadsPerGene.out.tab\n", "11_RZ_J_S28_L002_ReadsPerGene.out.tab 2_MA_C_S1_L004_ReadsPerGene.out.tab\n", "11_RZ_J_S28_L003_ReadsPerGene.out.tab 2_RZ_C_S9_L001_ReadsPerGene.out.tab\n", "11_RZ_J_S28_L004_ReadsPerGene.out.tab 2_RZ_C_S9_L002_ReadsPerGene.out.tab\n", "12_MA_P_S36_L001_ReadsPerGene.out.tab 2_RZ_C_S9_L003_ReadsPerGene.out.tab\n", "12_MA_P_S36_L002_ReadsPerGene.out.tab 2_RZ_C_S9_L004_ReadsPerGene.out.tab\n", "12_MA_P_S36_L003_ReadsPerGene.out.tab 2_TOT_C_S17_L001_ReadsPerGene.out.tab\n", "12_MA_P_S36_L004_ReadsPerGene.out.tab 2_TOT_C_S17_L002_ReadsPerGene.out.tab\n", "12_RZ_P_S44_L001_ReadsPerGene.out.tab 2_TOT_C_S17_L003_ReadsPerGene.out.tab\n", "12_RZ_P_S44_L002_ReadsPerGene.out.tab 2_TOT_C_S17_L004_ReadsPerGene.out.tab\n", "12_RZ_P_S44_L003_ReadsPerGene.out.tab 35_MA_P_S39_L001_ReadsPerGene.out.tab\n", "12_RZ_P_S44_L004_ReadsPerGene.out.tab 35_MA_P_S39_L002_ReadsPerGene.out.tab\n", "13_MA_J_S21_L001_ReadsPerGene.out.tab 35_MA_P_S39_L003_ReadsPerGene.out.tab\n", "13_MA_J_S21_L002_ReadsPerGene.out.tab 35_MA_P_S39_L004_ReadsPerGene.out.tab\n", "13_MA_J_S21_L003_ReadsPerGene.out.tab 35_RZ_P_S47_L001_ReadsPerGene.out.tab\n", "13_MA_J_S21_L004_ReadsPerGene.out.tab 35_RZ_P_S47_L002_ReadsPerGene.out.tab\n", "13_RZ_J_S29_L001_ReadsPerGene.out.tab 35_RZ_P_S47_L003_ReadsPerGene.out.tab\n", "13_RZ_J_S29_L002_ReadsPerGene.out.tab 35_RZ_P_S47_L004_ReadsPerGene.out.tab\n", "13_RZ_J_S29_L003_ReadsPerGene.out.tab 36_MA_J_S24_L001_ReadsPerGene.out.tab\n", "13_RZ_J_S29_L004_ReadsPerGene.out.tab 36_MA_J_S24_L002_ReadsPerGene.out.tab\n", "14_MA_C_S4_L001_ReadsPerGene.out.tab 36_MA_J_S24_L003_ReadsPerGene.out.tab\n", "14_MA_C_S4_L002_ReadsPerGene.out.tab 36_MA_J_S24_L004_ReadsPerGene.out.tab\n", "14_MA_C_S4_L003_ReadsPerGene.out.tab 36_RZ_J_S32_L001_ReadsPerGene.out.tab\n", "14_MA_C_S4_L004_ReadsPerGene.out.tab 36_RZ_J_S32_L002_ReadsPerGene.out.tab\n", "14_RZ_C_S12_L001_ReadsPerGene.out.tab 36_RZ_J_S32_L003_ReadsPerGene.out.tab\n", "14_RZ_C_S12_L002_ReadsPerGene.out.tab 36_RZ_J_S32_L004_ReadsPerGene.out.tab\n", "14_RZ_C_S12_L003_ReadsPerGene.out.tab 38_MA_P_S40_L001_ReadsPerGene.out.tab\n", "14_RZ_C_S12_L004_ReadsPerGene.out.tab 38_MA_P_S40_L002_ReadsPerGene.out.tab\n", "15_MA_C_S5_L001_ReadsPerGene.out.tab 38_MA_P_S40_L003_ReadsPerGene.out.tab\n", "15_MA_C_S5_L002_ReadsPerGene.out.tab 38_MA_P_S40_L004_ReadsPerGene.out.tab\n", "15_MA_C_S5_L003_ReadsPerGene.out.tab 38_RZ_P_S48_L001_ReadsPerGene.out.tab\n", "15_MA_C_S5_L004_ReadsPerGene.out.tab 38_RZ_P_S48_L002_ReadsPerGene.out.tab\n", "15_RZ_C_S13_L001_ReadsPerGene.out.tab 38_RZ_P_S48_L003_ReadsPerGene.out.tab\n", "15_RZ_C_S13_L002_ReadsPerGene.out.tab 38_RZ_P_S48_L004_ReadsPerGene.out.tab\n", "15_RZ_C_S13_L003_ReadsPerGene.out.tab 3_MA_J_S19_L001_ReadsPerGene.out.tab\n", "15_RZ_C_S13_L004_ReadsPerGene.out.tab 3_MA_J_S19_L002_ReadsPerGene.out.tab\n", "16_MA_P_S37_L001_ReadsPerGene.out.tab 3_MA_J_S19_L003_ReadsPerGene.out.tab\n", "16_MA_P_S37_L002_ReadsPerGene.out.tab 3_MA_J_S19_L004_ReadsPerGene.out.tab\n", "16_MA_P_S37_L003_ReadsPerGene.out.tab 3_RZ_J_S27_L001_ReadsPerGene.out.tab\n", "16_MA_P_S37_L004_ReadsPerGene.out.tab 3_RZ_J_S27_L002_ReadsPerGene.out.tab\n", "16_RZ_P_S45_L001_ReadsPerGene.out.tab 3_RZ_J_S27_L003_ReadsPerGene.out.tab\n", "16_RZ_P_S45_L002_ReadsPerGene.out.tab 3_RZ_J_S27_L004_ReadsPerGene.out.tab\n", "16_RZ_P_S45_L003_ReadsPerGene.out.tab 3_TOT_J_S34_L001_ReadsPerGene.out.tab\n", "16_RZ_P_S45_L004_ReadsPerGene.out.tab 3_TOT_J_S34_L002_ReadsPerGene.out.tab\n", "1_MA_J_S18_L001_ReadsPerGene.out.tab 3_TOT_J_S34_L003_ReadsPerGene.out.tab\n", "1_MA_J_S18_L002_ReadsPerGene.out.tab 3_TOT_J_S34_L004_ReadsPerGene.out.tab\n", "1_MA_J_S18_L003_ReadsPerGene.out.tab 40_MA_J_S25_L001_ReadsPerGene.out.tab\n", "1_MA_J_S18_L004_ReadsPerGene.out.tab 40_MA_J_S25_L002_ReadsPerGene.out.tab\n", "1_RZ_J_S26_L001_ReadsPerGene.out.tab 40_MA_J_S25_L003_ReadsPerGene.out.tab\n", "1_RZ_J_S26_L002_ReadsPerGene.out.tab 40_MA_J_S25_L004_ReadsPerGene.out.tab\n", "1_RZ_J_S26_L003_ReadsPerGene.out.tab 40_RZ_J_S33_L001_ReadsPerGene.out.tab\n", "1_RZ_J_S26_L004_ReadsPerGene.out.tab 40_RZ_J_S33_L002_ReadsPerGene.out.tab\n", "21_MA_C_S6_L001_ReadsPerGene.out.tab 40_RZ_J_S33_L003_ReadsPerGene.out.tab\n", "21_MA_C_S6_L002_ReadsPerGene.out.tab 40_RZ_J_S33_L004_ReadsPerGene.out.tab\n", "21_MA_C_S6_L003_ReadsPerGene.out.tab 45_MA_P_S41_L001_ReadsPerGene.out.tab\n", "21_MA_C_S6_L004_ReadsPerGene.out.tab 45_MA_P_S41_L002_ReadsPerGene.out.tab\n", "21_RZ_C_S14_L001_ReadsPerGene.out.tab 45_MA_P_S41_L003_ReadsPerGene.out.tab\n", "21_RZ_C_S14_L002_ReadsPerGene.out.tab 45_MA_P_S41_L004_ReadsPerGene.out.tab\n", "21_RZ_C_S14_L003_ReadsPerGene.out.tab 45_RZ_P_S49_L001_ReadsPerGene.out.tab\n", "21_RZ_C_S14_L004_ReadsPerGene.out.tab 45_RZ_P_S49_L002_ReadsPerGene.out.tab\n", "22_MA_C_S7_L001_ReadsPerGene.out.tab 45_RZ_P_S49_L003_ReadsPerGene.out.tab\n", "22_MA_C_S7_L002_ReadsPerGene.out.tab 45_RZ_P_S49_L004_ReadsPerGene.out.tab\n", "22_MA_C_S7_L003_ReadsPerGene.out.tab 47_MA_P_S42_L001_ReadsPerGene.out.tab\n", "22_MA_C_S7_L004_ReadsPerGene.out.tab 47_MA_P_S42_L002_ReadsPerGene.out.tab\n", "22_RZ_C_S15_L001_ReadsPerGene.out.tab 47_MA_P_S42_L003_ReadsPerGene.out.tab\n", "22_RZ_C_S15_L002_ReadsPerGene.out.tab 47_MA_P_S42_L004_ReadsPerGene.out.tab\n", "22_RZ_C_S15_L003_ReadsPerGene.out.tab 47_RZ_P_S50_L001_ReadsPerGene.out.tab\n", "22_RZ_C_S15_L004_ReadsPerGene.out.tab 47_RZ_P_S50_L002_ReadsPerGene.out.tab\n", "23_MA_J_S22_L001_ReadsPerGene.out.tab 47_RZ_P_S50_L003_ReadsPerGene.out.tab\n", "23_MA_J_S22_L002_ReadsPerGene.out.tab 47_RZ_P_S50_L004_ReadsPerGene.out.tab\n", "23_MA_J_S22_L003_ReadsPerGene.out.tab 4_MA_P_S35_L001_ReadsPerGene.out.tab\n", "23_MA_J_S22_L004_ReadsPerGene.out.tab 4_MA_P_S35_L002_ReadsPerGene.out.tab\n", "23_RZ_J_S30_L001_ReadsPerGene.out.tab 4_MA_P_S35_L003_ReadsPerGene.out.tab\n", "23_RZ_J_S30_L002_ReadsPerGene.out.tab 4_MA_P_S35_L004_ReadsPerGene.out.tab\n", "23_RZ_J_S30_L003_ReadsPerGene.out.tab 4_RZ_P_S43_L001_ReadsPerGene.out.tab\n", "23_RZ_J_S30_L004_ReadsPerGene.out.tab 4_RZ_P_S43_L002_ReadsPerGene.out.tab\n", "24_MA_J_S23_L001_ReadsPerGene.out.tab 4_RZ_P_S43_L003_ReadsPerGene.out.tab\n", "24_MA_J_S23_L002_ReadsPerGene.out.tab 4_RZ_P_S43_L004_ReadsPerGene.out.tab\n", "24_MA_J_S23_L003_ReadsPerGene.out.tab 4_TOT_P_S51_L001_ReadsPerGene.out.tab\n", "24_MA_J_S23_L004_ReadsPerGene.out.tab 4_TOT_P_S51_L002_ReadsPerGene.out.tab\n", "24_RZ_J_S31_L001_ReadsPerGene.out.tab 4_TOT_P_S51_L003_ReadsPerGene.out.tab\n", "24_RZ_J_S31_L002_ReadsPerGene.out.tab 4_TOT_P_S51_L004_ReadsPerGene.out.tab\n", "24_RZ_J_S31_L003_ReadsPerGene.out.tab 9_MA_C_S2_L001_ReadsPerGene.out.tab\n", "24_RZ_J_S31_L004_ReadsPerGene.out.tab 9_MA_C_S2_L002_ReadsPerGene.out.tab\n", "26_MA_C_S8_L001_ReadsPerGene.out.tab 9_MA_C_S2_L003_ReadsPerGene.out.tab\n", "26_MA_C_S8_L002_ReadsPerGene.out.tab 9_MA_C_S2_L004_ReadsPerGene.out.tab\n", "26_MA_C_S8_L003_ReadsPerGene.out.tab 9_RZ_C_S10_L001_ReadsPerGene.out.tab\n", "26_MA_C_S8_L004_ReadsPerGene.out.tab 9_RZ_C_S10_L002_ReadsPerGene.out.tab\n", "26_RZ_C_S16_L001_ReadsPerGene.out.tab 9_RZ_C_S10_L003_ReadsPerGene.out.tab\n", "26_RZ_C_S16_L002_ReadsPerGene.out.tab 9_RZ_C_S10_L004_ReadsPerGene.out.tab\n" ] } ], "source": [ "ls $DATDIR" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 2 }