{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prepare Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "CUROUT=$HOME/work/scratch/2015_output\n", "TH_DIR=$CUROUT/th_dir\n", "SAMPLE=\"8A_pilot\"\n", "GENOME_DIR=$CUROUT/genome\n", "\n", "ACCESSION=\"GCA_000010245.1_ASM1024v1\"\n", "PREFIX=${GENOME_DIR}/${ACCESSION}_genomic\n", "GFF=${PREFIX}.gff\n", "FA=${PREFIX}.fa" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# sort by coordinate\n", "samtools sort $TH_DIR/${SAMPLE}/accepted_hits.bam \\\n", " -o $TH_DIR/${SAMPLE}/accepted_hits.coord.bam\n", "\n", "# index sorted BAM\n", "samtools index $TH_DIR/${SAMPLE}/accepted_hits.coord.bam\n", "\n", "# index genome sequence\n", "samtools faidx $FA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Downloading Everything\n", "## Download IGV\n", "It is often helpful to use visualization software to interact with an assembly. We will be using Integrative Genomics Viewer (IGV) because it is pretty good, somewhat user friendly, and cross-platform.\n", "We need to download Integrative Genomics Viewer (IGV) for visualizing reads on our laptops. See instructions below for the type of computer you are using.\n", "\n", "### OS X (Macs)\n", "* IGV: You can download [a Mac only version]](http://data.broadinstitute.org/igv/projects/downloads/IGV_2.3.97.app.zip) or [a cross-platform version](http://data.broadinstitute.org/igv/projects/downloads/IGV_2.3.97.zip)\n", "### Windows\n", "* IGV: Download the [cross-platform version](http://data.broadinstitute.org/igv/projects/downloads/IGV_2.3.97.zip).\n", "### Linux\n", "* IGV: Download the [cross-platform version](http://data.broadinstitute.org/igv/projects/downloads/IGV_2.3.97.zip). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading files through Jupyter\n", "### Packaging up files\n", "We can download files through Jupyter, but we have to be careful. For many file types, Jupyter will try to open them when you click on the file in Jupyter's file browser. To outwit Jupyter, the safest thing to do is to package the file(s) you want to download into a format that Jupyter knows it should let you download (instead of trying to open). We can do this with `tar`. Using the following command will create a package file (commonly called a tarball) containing all of the files we need. \n", "\n", "* `--dereference` if there is a soft-link, package up the file that is linked to\n", "* `--create` we are creating a tarball, not unpackaging it\n", "* `--gzip` tells tar to also gzip (compress) the file\n", "* `--verbose` tell us what is happening while running\n", "* `--file TARBALL_NAME` tells tar what to name the tarball it is creating\n", "* FILE[S]_TO_PACKAGE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tar --dereference --create --gzip --verbose \\\n", " --file $CUROUT/stuff_for_igv.tgz \\\n", " $TH_DIR/${SAMPLE}/accepted_hits.coord.bam* $GFF $FA*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check that it worked . . ." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ls $CUROUT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the tarball\n", "Now you can do one of the following to download the tarball to your laptop:\n", "\n", "1. Ambitious\n", " 1. Click on the \"Jupyter\" logo above to open the Jupyter file browser\n", " 2. Naviagte your way to the directory where we saved the tarball: scratch/2015_output\n", " 3. Click on `stuff_for_igv.tgz` to download it\n", "2. Lazy\n", " 1. Click [here to get to the directory where we saved the tarball](/tree/scratch/2015_output)\n", " 2. Click on `stuff_for_igv.tgz` to download it\n", "3. Very Lazy\n", " 1. Just click [here to download the tarball](/tree/scratch/2015_output/stuff_for_igv.tgz)\n", " \n", "### Unpacking our tarball\n", "On a Mac you can \"untar\" by double clicking on the file in finder, or at the terminal with the command `tar -zxf my_notebooks.tgz`.\n", "\n", "On Windows, you can download software that will do it, such as [7-Zip](http://www.7-zip.org/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using IGV\n", "## Run IGV\n", "If you downloaded the Mac specific version], just double click. If you have the cross-platform version: unzip the binary distribution archive in a folder of your choosing. IGV is launched from a command prompt: follow instructions in the \"readme\" file. To launch igv on Mac or Linux platforms use the shell script \"igv.sh\". On Windows use \"igv.bat\".\n", "\n", "## Load Files\n", "Once IGV is running do the following within IGV:\n", "1. **Genome Sequence:** Genomes->Load Genome From File: genome/GCA_000010245.1_ASM1024v1_genomic.fa\n", "2. **Annotation:** File->Load From File: genome/GCA_000010245.1_ASM1024v1_genomic.gff\n", "3. **Bamfile:** File->Load From File: 8A_pilot/accepted_hits.coord.bam \n", "\n", "## Configurations to explore\n", "1. Zoom in until reads are visible\n", "2. Right click -> View as pairs\n", "3. Right click -> Color alignments by -> first-of-pair strand\n", "4. Right click->Collapsed\n", "\n", "## Look around!\n", "A few things to look for:\n", "- read strand relative to annotated gene strand\n", "- SNPs\n", "- Areas with no reads\n", "- Coverage depth plot\n", "- antisense reads\n", "- non-protein-coding RNAs" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 0 }