In [1]:
# time this notebook
SECONDS=0
Working with Multiple Samples¶
Let’s kick it up another notch - we have six samples, let’s run our analysis on all of them!
Shell Variables¶
Assign the variables in this notebook.
In [2]:
CUROUT=$HOME/work/scratch/2015_output
DEMUX=$CUROUT/demux_fastqs
CURDATA=/data/HTS_2015_data/runs_combined
INFO=$HOME/work/myinfo
TRIMMED=$CUROUT/trimmed_fastqs
GENOME_DIR=$CUROUT/genome
TH_DIR=$CUROUT/th_dir
COUNT_DIR=$CUROUT/counts
QC=$CUROUT/qc_output
ACCESSION="GCA_000010245.1_ASM1024v1"
PREFIX=${ACCESSION}_genomic
GFF=${PREFIX}.gff
FA=${PREFIX}.fa
Making New Directories¶
No new directories are necessary, we are using directories that were already created for the single-end pipeline.
In [3]:
mkdir -p $QC $TRIMMED $GENOME_DIR $TH_DIR $COUNT_DIR
Quality Control¶
First let’s run fastqc on everything. This is very easy, we can just
give it all the FASTQ files on the command line, and it runs on all of
them. We can use the wildcard *
to do this simply.
In [4]:
fastqc --threads 4 --quiet --outdir $QC $DEMUX/*.fq.gz
Skipping '/Users/cliburn/work/scratch/2015_output/demux_fastqs/*.fq.gz' which didn't exist, or couldn't be read
Looping over all the samples¶
Now we can put all of the previous commands into one big loop. This is probably a good time for copying and pasting. But we will make a few small changes.
- We will add all the sample names to the list of samples that the loop
will iterate over
- 7A_pilot
- 7B_pilot
- 7C_pilot
- 8A_pilot
- 8B_pilot
- 8C_pilot
- We need to remember to use our full adapter file
In [5]:
for SAMPLE in 7A_pilot 7B_pilot 7C_pilot 8A_pilot 8B_pilot 8C_pilot
do
echo $SAMPLE
fastq-mcf $INFO/neb_adapters.fasta \
$DEMUX/r1.${SAMPLE}.fq.gz \
$DEMUX/r2.${SAMPLE}.fq.gz \
-q 20 -x 0.5 \
-o $TRIMMED/r1.${SAMPLE}.trim.fq.gz \
-o $TRIMMED/r2.${SAMPLE}.trim.fq.gz
mkdir -p $TH_DIR/${SAMPLE}
tophat2 -G $GENOME_DIR/$GFF \
--library-type fr-firststrand \
--output-dir $TH_DIR/${SAMPLE} \
--max-intron-length 5 \
--min-intron-length 4 \
--transcriptome-max-hits 1 \
--max-multihits 1 \
--no-coverage-search \
--no-novel-juncs \
--no-sort-bam \
$GENOME_DIR/$PREFIX \
$TRIMMED/r1.${SAMPLE}.trim.fq.gz \
$TRIMMED/r2.${SAMPLE}.trim.fq.gz
# ln $TH_DIR/${SAMPLE}/accepted_hits.bam $TH_DIR/${SAMPLE}.bam
# samtools index $TH_DIR/${SAMPLE}.bam
samtools sort -n $TH_DIR/${SAMPLE}/accepted_hits.bam \
-o $TH_DIR/${SAMPLE}/accepted_hits.name.bam
htseq-count --quiet --order=name --format=bam --stranded=reverse --type=gene \
--idattr=Name $TH_DIR/${SAMPLE}/accepted_hits.name.bam \
$GENOME_DIR/$GFF > $COUNT_DIR/${SAMPLE}.csv
done
7A_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]
[2017-08-08 09:09:41] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:41] Checking for Bowtie
Bowtie version: 2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/7A_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/7A_pilot/accepted_hits.bam": No such file or directory
[Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/7A_pilot/accepted_hits.name.bam'
[Exception type: FileNotFoundError, raised in count.py:42]
7B_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]
[2017-08-08 09:09:41] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:41] Checking for Bowtie
Bowtie version: 2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/7B_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/7B_pilot/accepted_hits.bam": No such file or directory
[Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/7B_pilot/accepted_hits.name.bam'
[Exception type: FileNotFoundError, raised in count.py:42]
7C_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]
[2017-08-08 09:09:42] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:42] Checking for Bowtie
Bowtie version: 2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/7C_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/7C_pilot/accepted_hits.bam": No such file or directory
[Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/7C_pilot/accepted_hits.name.bam'
[Exception type: FileNotFoundError, raised in count.py:42]
8A_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]
[2017-08-08 09:09:42] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:42] Checking for Bowtie
Bowtie version: 2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam": No such file or directory
[Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.name.bam'
[Exception type: FileNotFoundError, raised in count.py:42]
8B_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]
[2017-08-08 09:09:42] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:42] Checking for Bowtie
Bowtie version: 2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8B_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8B_pilot/accepted_hits.bam": No such file or directory
[Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/8B_pilot/accepted_hits.name.bam'
[Exception type: FileNotFoundError, raised in count.py:42]
8C_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]
[2017-08-08 09:09:43] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:43] Checking for Bowtie
Bowtie version: 2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8C_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8C_pilot/accepted_hits.bam": No such file or directory
[Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/8C_pilot/accepted_hits.name.bam'
[Exception type: FileNotFoundError, raised in count.py:42]
In [6]:
ls $TH_DIR
7A_pilot 7C_pilot 8B_pilot
7B_pilot 8A_pilot 8C_pilot
In [7]:
ls -ltr $COUNT_DIR
total 0
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:08 r1_8A_pilot.tsv
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:09 7B_pilot.csv
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:09 7A_pilot.csv
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:09 8B_pilot.csv
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:09 8A_pilot.csv
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:09 7C_pilot.csv
-rw-r--r-- 1 cliburn staff 0 Aug 8 09:09 8C_pilot.csv
In [8]:
echo "The time it took to run this notebooks was:"
echo "$(($SECONDS/3600)) Hours, $(($SECONDS%3600/60)) Minutes, $(($SECONDS%60)) Seconds"
The time it took to run this notebooks was:
0 Hours, 0 Minutes, 6 Seconds