In [1]:

source bioinf_intro_config.sh
mkdir -p $TRIMMED $STAR_OUT $IGV_DIR

Prepare Data¶

This notebook depends on BAM files generated in the A globy pipeline section of the Looping with Globs notebook. If you have not already generated the BAM files, please run that notebook now

IGV needs indices for the BAM files. The index allows it to quickly load reads from different parts of the genome.

In [2]:

for BAM in ${STAR_OUT}/*_Aligned.sortedByCoord.out.bam
    do
        echo $BAM
        samtools index $BAM
    done

/Users/cliburn/work/scratch/bioinf_intro/star_out/*_Aligned.sortedByCoord.out.bam
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/bioinf_intro/star_out/*_Aligned.sortedByCoord.out.bam'
samtools index: failed to open "/Users/cliburn/work/scratch/bioinf_intro/star_out/*_Aligned.sortedByCoord.out.bam": No such file or directory

In [3]:

ls -ltr ${STAR_OUT}

Downloading Everything¶

Downloading files through Jupyter¶

We need to download the following files: 1. BAM file(s) 2. Index for each BAM file 3. Genome sequence (FASTA) 4. Genome annotation (GTF)

Jupyter File Browser¶

For each file: 1. Select checkbox next to filename 2. Click download

Packaging up files¶

Because we want to download several files, it might be easier to package them up using a program called tar, then download the resulting package file (commonly called a tarball) containing all of the files we need.

Link Directory¶

First we will do a little hack - we will create a directory of links to all the files we need to download. Since the original files are in two different directories, we are essentially makig a single “virtual directory” containing all the files.

In [4]:

ln -s ${STAR_OUT}/*.bam* $GTF $FASTA $IGV_DIR

ln: /Users/cliburn/work/scratch/bioinf_intro/igv/*.bam*: File exists
ln: /Users/cliburn/work/scratch/bioinf_intro/igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf: File exists
ln: /Users/cliburn/work/scratch/bioinf_intro/igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa: File exists

Taring¶

Here are the command line options we will use with tar

--dereference treat the soft-links in our virtual directory as if they were the files that are linked to
--create we are creating a tarball, not unpackaging it
--gzip tells tar to also gzip (compress) the file
--verbose tell us what is happening while running
--file TARBALL_NAME tells tar what to name the tarball it is creating
--directory PATH the base directory for the files to be tarred
FILE[S]_TO_PACKAGE

In [5]:

tar --dereference \
    --create \
    --gzip \
    --verbose \
    --file $CUROUT/stuff_for_igv.tgz \
    --directory $CUROUT \
    $(basename $IGV_DIR)

a igv
a igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa
a igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf
a igv/*.bam*

Let’s check that it worked …

In [6]:

echo $CUROUT
ls $CUROUT

/Users/cliburn/work/scratch/bioinf_intro
count_out               igv                     star_out
demo_chmod              myinfo                  stuff_for_igv.tgz
genome                  qc_output               trimmed_fastqs

Download the tarball¶

Now you can do one of the following to download the tarball to your laptop:

Click on the “Jupyter” logo above to open the Jupyter file browser
Naviagte your way to the directory where we saved the tarball (see echo $CUROUT above)
Click on stuff_for_igv.tgz to download it

Unpacking our tarball¶

On a Mac you can “untar” by double clicking on the file in finder, or at the terminal with the command tar -zxf my_notebooks.tgz.

On Windows, you can download software that will do it, such as 7-Zip

Download IGV¶

It is often helpful to use visualization software to interact with an assembly. We will be using Integrative Genomics Viewer (IGV) because it is pretty good, somewhat user friendly, and cross-platform. We need to download Integrative Genomics Viewer (IGV) for visualizing reads on our laptops. See instructions below for the type of computer you are using.

OS X (Macs)¶

Using IGV¶

Run IGV¶

If you downloaded the Mac specific version, just double click. If you have the cross-platform version: unzip the binary distribution archive in a folder of your choosing. IGV is launched from a command prompt: follow instructions in the “readme” file. To launch igv on Mac or Linux platforms use the shell script “igv.sh”. On Windows use “igv.bat”.

Load Files¶

Once IGV is running do the following within IGV: 1. Genome Sequence: Genomes->Load Genome From File: (select the FASTA file we just downloaded from Jupyter) 2. Annotation: File->Load From File: (select the GTF file we just downloaded from Jupyter) 3. Bamfile: File->Load From File: (select the BAM files we just downloaded from Jupyter)

Configurations to explore¶

Zoom in until reads are visible
Right click -> Color alignments by -> first-of-pair strand
Right click->Collapsed

Look around¶

A few things to look for: - read strand relative to annotated gene strand - intron-spanning reads - SNPs - Areas with no reads - Coverage depth plot - antisense reads - non-protein-coding RNAs