Working with Loops¶
Shell Variables¶
Assign the variables in this notebook.
In [1]:
source bioinf_intro_config.sh
mkdir -p $TRIMMED $STAR_OUT
A Brief journey into for
loops¶
for
loops take our use of the $FASTQ
variable to the next level!
It is analogous to how you would teach a child to set the table: “FOR
each place at the table, put a plate …, At the shell you phrase it
like this:
for PERSON in Alice Bob Carol Dave Eve
do
put plate at PERSON's place
put napkin at PERSON's place
put fork at PERSON's place
put spoon at PERSON's place
put knife at PERSON's place
done
Here is a real example:
In [2]:
for FASTQ in A B C D E F
do
echo "______${FASTQ}________"
done
______A________
______B________
______C________
______D________
______E________
______F________
The for
loop in Bash is conceptually the same as in any other
programming language, although the syntax may be different. The do
and done
are essential - do
needs to be before the “loop body”
(what is going to be repeated) and done
needs to be after it.
So let’s try something almost useful:
In [3]:
for FASTQ in 27_MA_P_S38_L002_R1
do
echo "RUNNING FASTQ: ${FASTQ}"
done
RUNNING FASTQ: 27_MA_P_S38_L002_R1
Now for the real thing …¶
Let’s run the pipeline in a loop:¶
Notice that we are now assigning to the $FASTQ
variable in the
for
statement
In [4]:
for FASTQ in 27_MA_P_S38_L002_R1
do
echo "---------------- TRIMMING: $FASTQ ----------------"
fastq-mcf \
$MYINFO/neb_e7600_adapters.fasta \
$RAW_FASTQS/${FASTQ}_001.fastq.gz \
-q 20 -x 0.5 \
-o $TRIMMED/${FASTQ}_001.trim.fastq.gz
echo "---------------- MAPPING: $FASTQ ----------------"
STAR \
--runMode alignReads \
--twopassMode None \
--genomeDir $GENOME_DIR \
--readFilesIn $TRIMMED/${FASTQ}_001.trim.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix ${STAR_OUT}/${FASTQ}_ \
--quantMode GeneCounts \
--outSAMtype None
done
---------------- TRIMMING: 27_MA_P_S38_L002_R1 ----------------
Command Line: /Users/cliburn/work/scratch/bioinf_intro/myinfo/neb_e7600_adapters.fasta /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz -q 20 -x 0.5 -o /Users/cliburn/work/scratch/bioinf_intro/trimmed_fastqs/27_MA_P_S38_L002_R1_001.trim.fastq.gz
Scale used: 2.2
gunzip: can't stat: /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz (/data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz.gz): No such file or directory
Phred: 64
No records in file /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz
---------------- MAPPING: 27_MA_P_S38_L002_R1 ----------------
STAR: Bad Option: --runMode.
Usage: STAR cmd [options] [-find] file1 ... filen [find expression]
Use STAR -help
and STAR -xhelp
to get a list of valid cmds and options.
Use STAR H=help
to get a list of valid archive header formats.
Use STAR diffopts=help
to get a list of valid diff options.
And let’s check the result¶
In [5]:
ls ${STAR_OUT}
In [6]:
head ${STAR_OUT}/27_MA_P_S38_L002_R1_ReadsPerGene.out.tab
head: /Users/cliburn/work/scratch/bioinf_intro/star_out/27_MA_P_S38_L002_R1_ReadsPerGene.out.tab: No such file or directory