In [1]:
# time this notebook
SECONDS=0

Working with Multiple Samples

Let’s kick it up another notch - we have six samples, let’s run our analysis on all of them!

Shell Variables

Assign the variables in this notebook.

In [2]:
CUROUT=$HOME/work/scratch/2015_output
DEMUX=$CUROUT/demux_fastqs
CURDATA=/data/HTS_2015_data/runs_combined
INFO=$HOME/work/myinfo
TRIMMED=$CUROUT/trimmed_fastqs
GENOME_DIR=$CUROUT/genome
TH_DIR=$CUROUT/th_dir
COUNT_DIR=$CUROUT/counts
QC=$CUROUT/qc_output

ACCESSION="GCA_000010245.1_ASM1024v1"
PREFIX=${ACCESSION}_genomic
GFF=${PREFIX}.gff
FA=${PREFIX}.fa

Making New Directories

No new directories are necessary, we are using directories that were already created for the single-end pipeline.

In [3]:
mkdir -p $QC $TRIMMED $GENOME_DIR $TH_DIR $COUNT_DIR

Quality Control

First let’s run fastqc on everything. This is very easy, we can just give it all the FASTQ files on the command line, and it runs on all of them. We can use the wildcard * to do this simply.

In [4]:
fastqc --threads 4 --quiet --outdir $QC $DEMUX/*.fq.gz
Skipping '/Users/cliburn/work/scratch/2015_output/demux_fastqs/*.fq.gz' which didn't exist, or couldn't be read

Looping over all the samples

Now we can put all of the previous commands into one big loop. This is probably a good time for copying and pasting. But we will make a few small changes.

  1. We will add all the sample names to the list of samples that the loop will iterate over
    • 7A_pilot
    • 7B_pilot
    • 7C_pilot
    • 8A_pilot
    • 8B_pilot
    • 8C_pilot
  2. We need to remember to use our full adapter file
In [5]:
for SAMPLE in 7A_pilot 7B_pilot 7C_pilot 8A_pilot 8B_pilot 8C_pilot
    do
        echo $SAMPLE
        fastq-mcf $INFO/neb_adapters.fasta \
            $DEMUX/r1.${SAMPLE}.fq.gz \
            $DEMUX/r2.${SAMPLE}.fq.gz \
            -q 20 -x 0.5 \
            -o $TRIMMED/r1.${SAMPLE}.trim.fq.gz \
            -o $TRIMMED/r2.${SAMPLE}.trim.fq.gz

        mkdir -p $TH_DIR/${SAMPLE}
        tophat2 -G $GENOME_DIR/$GFF \
            --library-type fr-firststrand \
            --output-dir $TH_DIR/${SAMPLE} \
            --max-intron-length 5 \
            --min-intron-length 4 \
            --transcriptome-max-hits 1 \
            --max-multihits 1 \
            --no-coverage-search \
            --no-novel-juncs \
            --no-sort-bam \
            $GENOME_DIR/$PREFIX \
            $TRIMMED/r1.${SAMPLE}.trim.fq.gz \
            $TRIMMED/r2.${SAMPLE}.trim.fq.gz

        # ln $TH_DIR/${SAMPLE}/accepted_hits.bam $TH_DIR/${SAMPLE}.bam
        # samtools index $TH_DIR/${SAMPLE}.bam

        samtools sort -n $TH_DIR/${SAMPLE}/accepted_hits.bam \
            -o $TH_DIR/${SAMPLE}/accepted_hits.name.bam

        htseq-count --quiet --order=name --format=bam --stranded=reverse --type=gene \
            --idattr=Name  $TH_DIR/${SAMPLE}/accepted_hits.name.bam \
            $GENOME_DIR/$GFF > $COUNT_DIR/${SAMPLE}.csv
    done
7A_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]

[2017-08-08 09:09:41] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:41] Checking for Bowtie
                  Bowtie version:        2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/7A_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/7A_pilot/accepted_hits.bam": No such file or directory
  [Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/7A_pilot/accepted_hits.name.bam'
  [Exception type: FileNotFoundError, raised in count.py:42]
7B_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]

[2017-08-08 09:09:41] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:41] Checking for Bowtie
                  Bowtie version:        2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/7B_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/7B_pilot/accepted_hits.bam": No such file or directory
  [Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/7B_pilot/accepted_hits.name.bam'
  [Exception type: FileNotFoundError, raised in count.py:42]
7C_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]

[2017-08-08 09:09:42] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:42] Checking for Bowtie
                  Bowtie version:        2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/7C_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/7C_pilot/accepted_hits.bam": No such file or directory
  [Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/7C_pilot/accepted_hits.name.bam'
  [Exception type: FileNotFoundError, raised in count.py:42]
8A_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]

[2017-08-08 09:09:42] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:42] Checking for Bowtie
                  Bowtie version:        2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam": No such file or directory
  [Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.name.bam'
  [Exception type: FileNotFoundError, raised in count.py:42]
8B_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]

[2017-08-08 09:09:42] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:42] Checking for Bowtie
                  Bowtie version:        2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8B_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8B_pilot/accepted_hits.bam": No such file or directory
  [Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/8B_pilot/accepted_hits.name.bam'
  [Exception type: FileNotFoundError, raised in count.py:42]
8C_pilot
Error opening adapter file '/Users/cliburn/work/myinfo/neb_adapters.fasta': No such file or directory
readlink: illegal option -- f
usage: readlink [-n] [file ...]

[2017-08-08 09:09:43] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2017-08-08 09:09:43] Checking for Bowtie
                  Bowtie version:        2.3.2.0
Error: cannot find transcript file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8C_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8C_pilot/accepted_hits.bam": No such file or directory
  [Errno 2] No such file or directory: '/Users/cliburn/work/scratch/2015_output/th_dir/8C_pilot/accepted_hits.name.bam'
  [Exception type: FileNotFoundError, raised in count.py:42]

In [6]:
ls $TH_DIR
7A_pilot        7C_pilot        8B_pilot
7B_pilot        8A_pilot        8C_pilot
In [7]:
ls -ltr $COUNT_DIR
total 0
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:08 r1_8A_pilot.tsv
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:09 7B_pilot.csv
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:09 7A_pilot.csv
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:09 8B_pilot.csv
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:09 8A_pilot.csv
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:09 7C_pilot.csv
-rw-r--r--  1 cliburn  staff  0 Aug  8 09:09 8C_pilot.csv
In [8]:
echo "The time it took to run this notebooks was:"
echo "$(($SECONDS/3600)) Hours, $(($SECONDS%3600/60)) Minutes, $(($SECONDS%60)) Seconds"
The time it took to run this notebooks was:
0 Hours, 0 Minutes, 6 Seconds