Making a Pipeline¶
Combining cells can make things easier
Shell Variables¶
Assign the variables in this notebook.
In [1]:
source bioinf_intro_config.sh
mkdir -p $TRIMMED $STAR_OUT
Combining the Steps¶
In [2]:
FASTQ="27_MA_P_S38_L002_R1"
echo "---------------- TRIMMING: $FASTQ ----------------"
fastq-mcf $MYINFO/neb_e7600_adapters.fasta \
$RAW_FASTQS/${FASTQ}_001.fastq.gz \
-q 20 -x 0.5 \
-o $TRIMMED/${FASTQ}_001.trim.fastq.gz
echo "---------------- MAPPING: $FASTQ ----------------"
STAR \
--runMode alignReads \
--twopassMode None \
--genomeDir $GENOME_DIR \
--readFilesIn $TRIMMED/${FASTQ}_001.trim.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix ${STAR_OUT}/${FASTQ}_ \
--quantMode GeneCounts \
--outSAMtype None
---------------- TRIMMING: 27_MA_P_S38_L002_R1 ----------------
Command Line: /Users/cliburn/work/scratch/bioinf_intro/myinfo/neb_e7600_adapters.fasta /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz -q 20 -x 0.5 -o /Users/cliburn/work/scratch/bioinf_intro/trimmed_fastqs/27_MA_P_S38_L002_R1_001.trim.fastq.gz
Scale used: 2.2
gunzip: can't stat: /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz (/data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz.gz): No such file or directory
Phred: 64
No records in file /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz
---------------- MAPPING: 27_MA_P_S38_L002_R1 ----------------
STAR: Bad Option: --runMode.
Usage: STAR cmd [options] [-find] file1 ... filen [find expression]
Use STAR -help
and STAR -xhelp
to get a list of valid cmds and options.
Use STAR H=help
to get a list of valid archive header formats.
Use STAR diffopts=help
to get a list of valid diff options.
And let’s check the result¶
In [3]:
ls ${STAR_OUT}
In [4]:
head ${STAR_OUT}/${FASTQ}_ReadsPerGene.out.tab
head: /Users/cliburn/work/scratch/bioinf_intro/star_out/27_MA_P_S38_L002_R1_ReadsPerGene.out.tab: No such file or directory