{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Making Generic Commands\n", "Let's let put the Bash shell to work for us by being smart with our variables\n", "\n", "## Shell Variables\n", "Assign the variables in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "source bioinf_intro_config.sh\n", "mkdir -p $TRIMMED $STAR_OUT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a *FASTQ* variable\n", "Most of the commands in our pipeline are complicated! To apply our pipeline to different FASTQs, we need to change things in multiple places. For example, just to run trimming with fastq-mcf, we need to change things in two places between each run: the input FASTQ and the output FASTQ. If we were doing this with paired-end reads, we would have to change four things. Doing this by hand is not only tedious, but error prone. Doing almost the same thing repeatedly is something that people are bad at, but computers are very good at! So let's get the computer to do the hard work. Because the Bash shell is almost magical (it is a full fledged programming language), we can do this easily." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FASTQ=\"A\"\n", "echo \"RUNNING FASTQ: ~~~~${FASTQ}~~~~\"\n", "echo \"TRIMING: ${FASTQ}\"\n", "echo \"MAPPING: ${FASTQ}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `for` loop in Bash is conceptually the same as in any other programming language, although the syntax may be different. The `do` and `done` are essential - `do` needs to be before the \"loop body\" (what is going to be repeated) and `done` needs to be after it.\n", "\n", "So let's try something almost useful:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FASTQ=\"27_MA_P_S38_L002_R1\"\n", "echo \"RUNNING FASTQ: ~~~~${FASTQ}~~~~\"\n", "echo \"TRIMING: ${FASTQ}\"\n", "echo \"MAPPING: ${FASTQ}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now for the real thing . . .\n", "### Let's run fastq-mcf:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FASTQ=\"27_MA_P_S38_L002_R1\"\n", "echo \"TRIMING: $FASTQ\"\n", "fastq-mcf $MYINFO/neb_e7600_adapters.fasta \\\n", " $RAW_FASTQS/${FASTQ}_001.fastq.gz \\\n", " -q 20 -x 0.5 \\\n", " -o $TRIMMED/${FASTQ}_001.trim.fastq.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now let's do the same thing for STAR" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FASTQ=\"27_MA_P_S38_L002_R1\"\n", "echo \"MAPPING: $FASTQ\"\n", "STAR \\\n", " --runMode alignReads \\\n", " --twopassMode None \\\n", " --genomeDir $GENOME_DIR \\\n", " --readFilesIn $TRIMMED/${FASTQ}_001.trim.fastq.gz \\\n", " --readFilesCommand gunzip -c \\\n", " --outFileNamePrefix ${STAR_OUT}/${FASTQ}_ \\\n", " --quantMode GeneCounts \\\n", " --outSAMtype None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And let's check the result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls ${STAR_OUT}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "head ${STAR_OUT}/${FASTQ}_ReadsPerGene.out.tab" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 1 }