Bash Functions¶
Load Variables and Make Directories¶
In [1]:
source bioinf_intro_config.sh
mkdir -p $STAR_OUT
Trim and Map Reads¶
In [2]:
TrimAndMap() {
FASTQ=$1
FASTQ_BASE="$(basename ${FASTQ} '_001.fastq.gz')"
echo $FASTQ
echo $FASTQ_BASE
# make a pipe for trimmed fastq
CUR_PIPE=`mktemp --dry-run`_${FASTQ_BASE}_pipe.fq
mkfifo $CUR_PIPE
# Run fastq-mcf
fastq-mcf \
$ADAPTERS \
$FASTQ \
-o $CUR_PIPE \
-q 20 -x 0.5 &
# Run STAR
STAR \
--runMode alignReads \
--twopassMode None \
--genomeDir $GENOME_DIR \
--outSAMtype None \
--quantMode GeneCounts \
--outFileNamePrefix ${STAR_OUT}/${FASTQ_BASE}_ \
--alignIntronMax 5000 \
--outSJfilterIntronMaxVsReadN 500 1000 2000 \
--readFilesIn $CUR_PIPE
# Destroy pipe
rm -f $CUR_PIPE
}
export -f TrimAndMap
Call the function¶
In [3]:
for FASTQ in $RAW_FASTQS/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz
do
TrimAndMap $FASTQ
done
/data/hts2018_pilot/Granek_4837_180427A5/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz
35_MA_P_S39_L00[1-2]_R1
mktemp: illegal option -- -
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
mktemp [-d] [-q] [-u] -t prefix
[1] 2417
Command Line: /Users/cliburn/work/scratch/bioinf_intro/myinfo/neb_e7600_adapters.fasta /data/hts2018_pilot/Granek_4837_180427A5/35_MA_P_S39_L00[1-2]_R1_001.fastq.gz -o _35_MA_P_S39_L00[1-2]_R1_pipe.fq -q 20 -x 0.5
Scale used: 2.2
STAR: Bad Option: --runMode.
Usage: STAR cmd [options] [-find] file1 ... filen [find expression]
Use STAR -help
and STAR -xhelp
to get a list of valid cmds and options.
Use STAR H=help
to get a list of valid archive header formats.
Use STAR diffopts=help
to get a list of valid diff options.
In [4]:
ls -ltr $STAR_OUT
Notes¶
- We are using a named pipe here, but that is not at all necessary for
bash functions, its just a bell-and-whistle
fastq-mcf
is outputing the trimmed fastq into the named pipe and STAR is reading the trimmed fastq from the named pipe- Because we are channeling the output of
fastq-mcf
toSTAR
through a named pipe, we are not tellingfastq-mcf
to gzip its output. In this case gzipping doesn’t buy us anything: it doesn’t save disk space, because data passed through the named pipe never touches the disk, and for this same reason, it doesn’t save us time in writing to or reading from the disk. If we gzipped, we would incur the computational cost of compressing and decompressing without any benefit
$1
refers to the first argument passed to the function- We could call the
TrimAndMap
function directly, we don’t have to call it within a loop.