In [1]:
source bioinf_intro_config.sh
mkdir -p $TRIMMED $STAR_OUT $IGV_DIR
Prepare Data¶
This notebook depends on BAM files generated in the A globy pipeline section of the Looping with Globs notebook. If you have not already generated the BAM files, please run that notebook now
IGV needs indices for the BAM files. The index allows it to quickly load reads from different parts of the genome.
In [2]:
for BAM in ${STAR_OUT}/*_Aligned.sortedByCoord.out.bam
do
echo $BAM
samtools index $BAM
done
/Users/cliburn/work/scratch/bioinf_intro/star_out/*_Aligned.sortedByCoord.out.bam
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/bioinf_intro/star_out/*_Aligned.sortedByCoord.out.bam'
samtools index: failed to open "/Users/cliburn/work/scratch/bioinf_intro/star_out/*_Aligned.sortedByCoord.out.bam": No such file or directory
In [3]:
ls -ltr ${STAR_OUT}
Downloading Everything¶
Downloading files through Jupyter¶
We need to download the following files: 1. BAM file(s) 2. Index for each BAM file 3. Genome sequence (FASTA) 4. Genome annotation (GTF)
Jupyter File Browser¶
For each file: 1. Select checkbox next to filename 2. Click download
Packaging up files¶
Because we want to download several files, it might be easier to package
them up using a program called tar
, then download the resulting
package file (commonly called a tarball) containing all of the files we
need.
Link Directory¶
First we will do a little hack - we will create a directory of links to all the files we need to download. Since the original files are in two different directories, we are essentially makig a single “virtual directory” containing all the files.
In [4]:
ln -s ${STAR_OUT}/*.bam* $GTF $FASTA $IGV_DIR
ln: /Users/cliburn/work/scratch/bioinf_intro/igv/*.bam*: File exists
ln: /Users/cliburn/work/scratch/bioinf_intro/igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf: File exists
ln: /Users/cliburn/work/scratch/bioinf_intro/igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa: File exists
Taring¶
Here are the command line options we will use with tar
--dereference
treat the soft-links in our virtual directory as if they were the files that are linked to--create
we are creating a tarball, not unpackaging it--gzip
tells tar to also gzip (compress) the file--verbose
tell us what is happening while running--file TARBALL_NAME
tells tar what to name the tarball it is creating--directory PATH
the base directory for the files to be tarred- FILE[S]_TO_PACKAGE
In [5]:
tar --dereference \
--create \
--gzip \
--verbose \
--file $CUROUT/stuff_for_igv.tgz \
--directory $CUROUT \
$(basename $IGV_DIR)
a igv
a igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa
a igv/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf
a igv/*.bam*
Let’s check that it worked …
In [6]:
echo $CUROUT
ls $CUROUT
/Users/cliburn/work/scratch/bioinf_intro
count_out igv star_out
demo_chmod myinfo stuff_for_igv.tgz
genome qc_output trimmed_fastqs
Download the tarball¶
Now you can do one of the following to download the tarball to your laptop:
- Click on the “Jupyter” logo above to open the Jupyter file browser
- Naviagte your way to the directory where we saved the tarball (see
echo $CUROUT
above) - Click on
stuff_for_igv.tgz
to download it
Download IGV¶
It is often helpful to use visualization software to interact with an assembly. We will be using Integrative Genomics Viewer (IGV) because it is pretty good, somewhat user friendly, and cross-platform. We need to download Integrative Genomics Viewer (IGV) for visualizing reads on our laptops. See instructions below for the type of computer you are using.
OS X (Macs)¶
Using IGV¶
Run IGV¶
If you downloaded the Mac specific version, just double click. If you have the cross-platform version: unzip the binary distribution archive in a folder of your choosing. IGV is launched from a command prompt: follow instructions in the “readme” file. To launch igv on Mac or Linux platforms use the shell script “igv.sh”. On Windows use “igv.bat”.
Load Files¶
Once IGV is running do the following within IGV: 1. Genome Sequence: Genomes->Load Genome From File: (select the FASTA file we just downloaded from Jupyter) 2. Annotation: File->Load From File: (select the GTF file we just downloaded from Jupyter) 3. Bamfile: File->Load From File: (select the BAM files we just downloaded from Jupyter)
Configurations to explore¶
- Zoom in until reads are visible
- Right click -> Color alignments by -> first-of-pair strand
- Right click->Collapsed
Look around¶
A few things to look for: - read strand relative to annotated gene strand - intron-spanning reads - SNPs - Areas with no reads - Coverage depth plot - antisense reads - non-protein-coding RNAs