Prepare Data

In [1]:
CUROUT=$HOME/work/scratch/2015_output
TH_DIR=$CUROUT/th_dir
SAMPLE="8A_pilot"
GENOME_DIR=$CUROUT/genome

ACCESSION="GCA_000010245.1_ASM1024v1"
PREFIX=${GENOME_DIR}/${ACCESSION}_genomic
GFF=${PREFIX}.gff
FA=${PREFIX}.fa
In [2]:
# sort by coordinate
samtools sort $TH_DIR/${SAMPLE}/accepted_hits.bam \
    -o $TH_DIR/${SAMPLE}/accepted_hits.coord.bam

# index sorted BAM
samtools index $TH_DIR/${SAMPLE}/accepted_hits.coord.bam

# index genome sequence
samtools faidx $FA
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam": No such file or directory
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.coord.bam'
samtools index: failed to open "/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.coord.bam": No such file or directory
[fai_build] fail to open the FASTA file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.fa
Could not build fai index /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.fa.fai

Downloading Everything

Download IGV

It is often helpful to use visualization software to interact with an assembly. We will be using Integrative Genomics Viewer (IGV) because it is pretty good, somewhat user friendly, and cross-platform. We need to download Integrative Genomics Viewer (IGV) for visualizing reads on our laptops. See instructions below for the type of computer you are using.

OS X (Macs)

Downloading files through Jupyter

Packaging up files

We can download files through Jupyter, but we have to be careful. For many file types, Jupyter will try to open them when you click on the file in Jupyter’s file browser. To outwit Jupyter, the safest thing to do is to package the file(s) you want to download into a format that Jupyter knows it should let you download (instead of trying to open). We can do this with tar. Using the following command will create a package file (commonly called a tarball) containing all of the files we need.

  • --dereference if there is a soft-link, package up the file that is linked to
  • --create we are creating a tarball, not unpackaging it
  • --gzip tells tar to also gzip (compress) the file
  • --verbose tell us what is happening while running
  • --file TARBALL_NAME tells tar what to name the tarball it is creating
  • FILE[S]_TO_PACKAGE
In [3]:
tar --dereference --create --gzip --verbose \
    --file $CUROUT/stuff_for_igv.tgz \
    $TH_DIR/${SAMPLE}/accepted_hits.coord.bam* $GFF $FA*
tar: /Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.coord.bam*: Cannot stat: No such file or directory
tar: /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff: Cannot stat: No such file or directory
tar: /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.fa*: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors.

Let’s check that it worked . . .

In [4]:
ls $CUROUT
counts                  qc_output               trimmed_fastqs
demux_fastqs            stuff_for_igv.tgz
genome                  th_dir

Download the tarball

Now you can do one of the following to download the tarball to your laptop:

  1. Ambitious
    1. Click on the “Jupyter” logo above to open the Jupyter file browser
    2. Naviagte your way to the directory where we saved the tarball: scratch/2015_output
    3. Click on stuff_for_igv.tgz to download it
  2. Lazy
    1. Click here to get to the directory where we saved the tarball
    2. Click on stuff_for_igv.tgz to download it
  3. Very Lazy
    1. Just click here to download the tarball

Unpacking our tarball

On a Mac you can “untar” by double clicking on the file in finder, or at the terminal with the command tar -zxf my_notebooks.tgz.

On Windows, you can download software that will do it, such as 7-Zip

Using IGV

Run IGV

If you downloaded the Mac specific version], just double click. If you have the cross-platform version: unzip the binary distribution archive in a folder of your choosing. IGV is launched from a command prompt: follow instructions in the “readme” file. To launch igv on Mac or Linux platforms use the shell script “igv.sh”. On Windows use “igv.bat”.

Load Files

Once IGV is running do the following within IGV: 1. Genome Sequence: Genomes->Load Genome From File: genome/GCA_000010245.1_ASM1024v1_genomic.fa 2. Annotation: File->Load From File: genome/GCA_000010245.1_ASM1024v1_genomic.gff 3. Bamfile: File->Load From File: 8A_pilot/accepted_hits.coord.bam

Configurations to explore

  1. Zoom in until reads are visible
  2. Right click -> View as pairs
  3. Right click -> Color alignments by -> first-of-pair strand
  4. Right click->Collapsed

Look around!

A few things to look for: - read strand relative to annotated gene strand - SNPs - Areas with no reads - Coverage depth plot - antisense reads - non-protein-coding RNAs