Bash Functions

Load Variables and Make Directories

In [1]:
source bioinf_intro_config.sh
mkdir -p $STAR_OUT

Trim and Map Reads

In [2]:
TrimAndMap() {
    FASTQ=$1
    FASTQ_BASE="$(basename ${FASTQ} '_001.fastq.gz')"
    echo $FASTQ
    echo $FASTQ_BASE

    # make a pipe for trimmed fastq
    CUR_PIPE=`mktemp --dry-run`_${FASTQ_BASE}_pipe.fq
    mkfifo $CUR_PIPE

    # Run fastq-mcf
    fastq-mcf \
        $ADAPTERS \
        $FASTQ \
        -o $CUR_PIPE \
        -q 20 -x 0.5 &

    # Run STAR
    STAR \
    --runMode alignReads \
    --twopassMode None \
    --genomeDir $GENOME_DIR \
    --outSAMtype None \
    --quantMode GeneCounts \
    --outFileNamePrefix ${STAR_OUT}/${FASTQ_BASE}_ \
    --alignIntronMax 5000 \
    --outSJfilterIntronMaxVsReadN 500 1000 2000 \
    --readFilesIn $CUR_PIPE

    # Destroy pipe
    rm -f $CUR_PIPE
}

export -f TrimAndMap

Call the function

In [3]:
for FASTQ in $RAW_FASTQS/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz
    do
        TrimAndMap $FASTQ
    done
/data/hts2018_pilot/Granek_4837_180427A5/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz
35_MA_P_S39_L00[1-2]_R1
mktemp: illegal option -- -
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
[1] 2417
Command Line: /Users/cliburn/work/scratch/bioinf_intro/myinfo/neb_e7600_adapters.fasta /data/hts2018_pilot/Granek_4837_180427A5/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz -o _35_MA_P_S39_L00[1-2]_R1_pipe.fq -q 20 -x 0.5
Scale used: 2.2
STAR: Bad Option: --runMode.
Usage:  STAR cmd [options] [-find] file1 ... filen [find expression]

Use     STAR -help
and     STAR -xhelp
to get a list of valid cmds and options.

Use     STAR H=help
to get a list of valid archive header formats.

Use     STAR diffopts=help
to get a list of valid diff options.

In [4]:
ls -ltr $STAR_OUT

Notes

  1. We are using a named pipe here, but that is not at all necessary for bash functions, its just a bell-and-whistle
    1. fastq-mcf is outputing the trimmed fastq into the named pipe and STAR is reading the trimmed fastq from the named pipe
    2. Because we are channeling the output of fastq-mcf to STAR through a named pipe, we are not telling fastq-mcf to gzip its output. In this case gzipping doesn’t buy us anything: it doesn’t save disk space, because data passed through the named pipe never touches the disk, and for this same reason, it doesn’t save us time in writing to or reading from the disk. If we gzipped, we would incur the computational cost of compressing and decompressing without any benefit
  2. $1 refers to the first argument passed to the function
  3. We could call the TrimAndMap function directly, we don’t have to call it within a loop.