{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bash Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Variables and Make Directories" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "source bioinf_intro_config.sh\n", "mkdir -p $STAR_OUT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trim and Map Reads" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TrimAndMap() {\n", " FASTQ=$1\n", " FASTQ_BASE=\"$(basename ${FASTQ} '_001.fastq.gz')\"\n", " echo $FASTQ\n", " echo $FASTQ_BASE\n", "\n", " # make a pipe for trimmed fastq\n", " CUR_PIPE=`mktemp --dry-run`_${FASTQ_BASE}_pipe.fq\n", " mkfifo $CUR_PIPE\n", "\n", " # Run fastq-mcf\n", " fastq-mcf \\\n", " $ADAPTERS \\\n", " $FASTQ \\\n", " -o $CUR_PIPE \\\n", " -q 20 -x 0.5 &\n", " \n", " # Run STAR\n", " STAR \\\n", " --runMode alignReads \\\n", " --twopassMode None \\\n", " --genomeDir $GENOME_DIR \\\n", " --outSAMtype None \\\n", " --quantMode GeneCounts \\\n", " --outFileNamePrefix ${STAR_OUT}/${FASTQ_BASE}_ \\\n", " --alignIntronMax 5000 \\\n", " --outSJfilterIntronMaxVsReadN 500 1000 2000 \\\n", " --readFilesIn $CUR_PIPE \n", " \n", " # Destroy pipe\n", " rm -f $CUR_PIPE\n", "}\n", "\n", "export -f TrimAndMap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Call the function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for FASTQ in $RAW_FASTQS/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz\n", " do\n", " TrimAndMap $FASTQ\n", " done" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls -ltr $STAR_OUT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notes\n", "1. We are using a named pipe here, but that is not at all necessary for bash functions, its just a bell-and-whistle\n", " 1. `fastq-mcf` is outputing the trimmed fastq *into* the named pipe and STAR is reading the trimmed fastq *from* the named pipe\n", " 2. Because we are channeling the output of `fastq-mcf` to `STAR` through a named pipe, we are not telling `fastq-mcf` to gzip its output. In this case gzipping doesn't buy us anything: it doesn't save disk space, because data passed through the named pipe never touches the disk, and for this same reason, it doesn't save us time in writing to or reading from the disk. If we gzipped, we would incur the computational cost of compressing and decompressing without any benefit\n", "\n", "2. `$1` refers to the first argument passed to the function\n", "3. We *could* call the `TrimAndMap` function directly, we don't have to call it within a loop." ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 1 }