Prepare Data¶
In [1]:
CUROUT=$HOME/work/scratch/2015_output
TH_DIR=$CUROUT/th_dir
SAMPLE="8A_pilot"
GENOME_DIR=$CUROUT/genome
ACCESSION="GCA_000010245.1_ASM1024v1"
PREFIX=${GENOME_DIR}/${ACCESSION}_genomic
GFF=${PREFIX}.gff
FA=${PREFIX}.fa
In [2]:
# sort by coordinate
samtools sort $TH_DIR/${SAMPLE}/accepted_hits.bam \
-o $TH_DIR/${SAMPLE}/accepted_hits.coord.bam
# index sorted BAM
samtools index $TH_DIR/${SAMPLE}/accepted_hits.coord.bam
# index genome sequence
samtools faidx $FA
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam'
samtools sort: can't open "/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.bam": No such file or directory
[E::hts_open_format] fail to open file '/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.coord.bam'
samtools index: failed to open "/Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.coord.bam": No such file or directory
[fai_build] fail to open the FASTA file /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.fa
Could not build fai index /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.fa.fai
Downloading Everything¶
Download IGV¶
It is often helpful to use visualization software to interact with an assembly. We will be using Integrative Genomics Viewer (IGV) because it is pretty good, somewhat user friendly, and cross-platform. We need to download Integrative Genomics Viewer (IGV) for visualizing reads on our laptops. See instructions below for the type of computer you are using.
OS X (Macs)¶
- IGV: You can download [a Mac only version]](http://data.broadinstitute.org/igv/projects/downloads/IGV_2.3.97.app.zip) or a cross-platform version ### Windows
- IGV: Download the cross-platform version. ### Linux
- IGV: Download the cross-platform version.
Downloading files through Jupyter¶
Packaging up files¶
We can download files through Jupyter, but we have to be careful. For
many file types, Jupyter will try to open them when you click on the
file in Jupyter’s file browser. To outwit Jupyter, the safest thing to
do is to package the file(s) you want to download into a format that
Jupyter knows it should let you download (instead of trying to open). We
can do this with tar
. Using the following command will create a
package file (commonly called a tarball) containing all of the files we
need.
--dereference
if there is a soft-link, package up the file that is linked to--create
we are creating a tarball, not unpackaging it--gzip
tells tar to also gzip (compress) the file--verbose
tell us what is happening while running--file TARBALL_NAME
tells tar what to name the tarball it is creating- FILE[S]_TO_PACKAGE
In [3]:
tar --dereference --create --gzip --verbose \
--file $CUROUT/stuff_for_igv.tgz \
$TH_DIR/${SAMPLE}/accepted_hits.coord.bam* $GFF $FA*
tar: /Users/cliburn/work/scratch/2015_output/th_dir/8A_pilot/accepted_hits.coord.bam*: Cannot stat: No such file or directory
tar: /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.gff: Cannot stat: No such file or directory
tar: /Users/cliburn/work/scratch/2015_output/genome/GCA_000010245.1_ASM1024v1_genomic.fa*: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors.
Let’s check that it worked . . .
In [4]:
ls $CUROUT
counts qc_output trimmed_fastqs
demux_fastqs stuff_for_igv.tgz
genome th_dir
Download the tarball¶
Now you can do one of the following to download the tarball to your laptop:
- Ambitious
- Click on the “Jupyter” logo above to open the Jupyter file browser
- Naviagte your way to the directory where we saved the tarball: scratch/2015_output
- Click on
stuff_for_igv.tgz
to download it
- Lazy
- Click here to get to the directory where we saved the tarball
- Click on
stuff_for_igv.tgz
to download it
- Very Lazy
- Just click here to download the tarball
Using IGV¶
Run IGV¶
If you downloaded the Mac specific version], just double click. If you have the cross-platform version: unzip the binary distribution archive in a folder of your choosing. IGV is launched from a command prompt: follow instructions in the “readme” file. To launch igv on Mac or Linux platforms use the shell script “igv.sh”. On Windows use “igv.bat”.
Load Files¶
Once IGV is running do the following within IGV: 1. Genome Sequence: Genomes->Load Genome From File: genome/GCA_000010245.1_ASM1024v1_genomic.fa 2. Annotation: File->Load From File: genome/GCA_000010245.1_ASM1024v1_genomic.gff 3. Bamfile: File->Load From File: 8A_pilot/accepted_hits.coord.bam
Configurations to explore¶
- Zoom in until reads are visible
- Right click -> View as pairs
- Right click -> Color alignments by -> first-of-pair strand
- Right click->Collapsed
Look around!¶
A few things to look for: - read strand relative to annotated gene strand - SNPs - Areas with no reads - Coverage depth plot - antisense reads - non-protein-coding RNAs