Working with Loops

Shell Variables

Assign the variables in this notebook.

In [1]:
source bioinf_intro_config.sh
mkdir -p $TRIMMED $STAR_OUT

A Brief journey into for loops

for loops take our use of the $FASTQ variable to the next level! It is analogous to how you would teach a child to set the table: “FOR each place at the table, put a plate …, At the shell you phrase it like this:

for PERSON in Alice Bob Carol Dave Eve
do
put plate at PERSON's place
put napkin at PERSON's place
put fork at PERSON's place
put spoon at PERSON's place
put knife at PERSON's place
done

Here is a real example:

In [2]:
for FASTQ in A B C D E F
    do
       echo "______${FASTQ}________"
    done
______A________
______B________
______C________
______D________
______E________
______F________

The for loop in Bash is conceptually the same as in any other programming language, although the syntax may be different. The do and done are essential - do needs to be before the “loop body” (what is going to be repeated) and done needs to be after it.

So let’s try something almost useful:

In [3]:
for FASTQ in 27_MA_P_S38_L002_R1
    do
        echo "RUNNING FASTQ: ${FASTQ}"
    done
RUNNING FASTQ: 27_MA_P_S38_L002_R1

Now for the real thing …

Let’s run the pipeline in a loop:

Notice that we are now assigning to the $FASTQ variable in the for statement

In [4]:
for FASTQ in 27_MA_P_S38_L002_R1
    do
        echo "---------------- TRIMMING: $FASTQ ----------------"
        fastq-mcf \
            $MYINFO/neb_e7600_adapters.fasta \
            $RAW_FASTQS/${FASTQ}_001.fastq.gz \
            -q 20 -x 0.5 \
            -o $TRIMMED/${FASTQ}_001.trim.fastq.gz

        echo "---------------- MAPPING: $FASTQ ----------------"
        STAR \
            --runMode alignReads \
            --twopassMode None \
            --genomeDir $GENOME_DIR \
            --readFilesIn $TRIMMED/${FASTQ}_001.trim.fastq.gz \
            --readFilesCommand gunzip -c \
            --outFileNamePrefix ${STAR_OUT}/${FASTQ}_ \
            --quantMode GeneCounts \
            --outSAMtype None
    done
---------------- TRIMMING: 27_MA_P_S38_L002_R1 ----------------
Command Line: /Users/cliburn/work/scratch/bioinf_intro/myinfo/neb_e7600_adapters.fasta /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz -q 20 -x 0.5 -o /Users/cliburn/work/scratch/bioinf_intro/trimmed_fastqs/27_MA_P_S38_L002_R1_001.trim.fastq.gz
Scale used: 2.2
gunzip: can't stat: /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz (/data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz.gz): No such file or directory
Phred: 64
No records in file /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz
---------------- MAPPING: 27_MA_P_S38_L002_R1 ----------------
STAR: Bad Option: --runMode.
Usage:  STAR cmd [options] [-find] file1 ... filen [find expression]

Use     STAR -help
and     STAR -xhelp
to get a list of valid cmds and options.

Use     STAR H=help
to get a list of valid archive header formats.

Use     STAR diffopts=help
to get a list of valid diff options.

And let’s check the result

In [5]:
ls ${STAR_OUT}
In [6]:
head ${STAR_OUT}/27_MA_P_S38_L002_R1_ReadsPerGene.out.tab
head: /Users/cliburn/work/scratch/bioinf_intro/star_out/27_MA_P_S38_L002_R1_ReadsPerGene.out.tab: No such file or directory