{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "source bioinf_intro_config.sh\n", "mkdir -p $TRIMMED $STAR_OUT $IGV_DIR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Prepare Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This notebook depends on BAM files generated in the [A globy pipeline](glob_loop.ipynb#A-globy-pipeline) section of the *Looping with Globs* notebook. If you have not already generated the BAM files, please run that notebook now\n", "\n", "\n", "IGV needs indices for the BAM files. The index allows it to quickly load reads from different parts of the genome." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for BAM in ${STAR_OUT}/*_Aligned.sortedByCoord.out.bam\n", " do\n", " echo $BAM\n", " samtools index $BAM\n", " done" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls -ltr ${STAR_OUT}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Downloading Everything\n", "## Downloading files through Jupyter\n", "We need to download the following files:\n", "1. BAM file(s)\n", "2. Index for each BAM file\n", "3. Genome sequence (FASTA)\n", "4. Genome annotation (GTF)\n", "\n", "### Jupyter File Browser\n", "For each file:\n", "1. Select checkbox next to filename\n", "2. Click download\n", "\n", "### Packaging up files\n", "Because we want to download several files, it might be easier to package them up using a program called `tar`, then download the resulting package file (commonly called a tarball) containing all of the files we need. \n", "\n", "#### Link Directory\n", "First we will do a little hack - we will create a directory of links to all the files we need to download. Since the original files are in two different directories, we are essentially makig a single \"virtual directory\" containing all the files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ln -s ${STAR_OUT}/*.bam* $GTF $FASTA $IGV_DIR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Taring\n", "Here are the command line options we will use with `tar`\n", "\n", "* `--dereference` treat the soft-links in our virtual directory as if they were the files that are linked to\n", "* `--create` we are creating a tarball, not unpackaging it\n", "* `--gzip` tells tar to also gzip (compress) the file\n", "* `--verbose` tell us what is happening while running\n", "* `--file TARBALL_NAME` tells tar what to name the tarball it is creating\n", "* `--directory PATH` the base directory for the files to be tarred \n", "* FILE[S]_TO_PACKAGE" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tar --dereference \\\n", " --create \\\n", " --gzip \\\n", " --verbose \\\n", " --file $CUROUT/stuff_for_igv.tgz \\\n", " --directory $CUROUT \\\n", " $(basename $IGV_DIR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check that it worked . . ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "echo $CUROUT\n", "ls $CUROUT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Download the tarball\n", "Now you can do one of the following to download the tarball to your laptop:\n", "\n", "1. Click on the \"Jupyter\" logo above to open the Jupyter file browser\n", "2. Naviagte your way to the directory where we saved the tarball (see `echo $CUROUT` above)\n", "3. Click on `stuff_for_igv.tgz` to download it\n", "\n", "#### Unpacking our tarball\n", "On a Mac you can \"untar\" by double clicking on the file in finder, or at the terminal with the command `tar -zxf my_notebooks.tgz`.\n", "\n", "On Windows, you can download software that will do it, such as [7-Zip](http://www.7-zip.org/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download IGV\n", "It is often helpful to use visualization software to interact with an assembly. We will be using Integrative Genomics Viewer (IGV) because it is pretty good, somewhat user friendly, and cross-platform.\n", "We need to download Integrative Genomics Viewer (IGV) for visualizing reads on our laptops. See instructions below for the type of computer you are using.\n", "\n", "### OS X (Macs)\n", "1. [a Mac only version](http://data.broadinstitute.org/igv/projects/downloads/2.4/IGV_2.4.13.app.zip)\n", "2. [the cross-platform version](http://data.broadinstitute.org/igv/projects/downloads/2.4/IGV_2.4.13.zip)\n", "### Windows\n", "1. [a Windows only version](http://data.broadinstitute.org/igv/projects/downloads/2.4/IGV_Win_2.4.13.zip)\n", "2. [the cross-platform version](http://data.broadinstitute.org/igv/projects/downloads/2.4/IGV_2.4.13.zip)\n", "### Linux\n", "1. [the cross-platform version](http://data.broadinstitute.org/igv/projects/downloads/2.4/IGV_2.4.13.zip)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using IGV\n", "## Run IGV\n", "If you downloaded the Mac specific version, just double click. If you have the cross-platform version: unzip the binary distribution archive in a folder of your choosing. IGV is launched from a command prompt: follow instructions in the \"readme\" file. To launch igv on Mac or Linux platforms use the shell script \"igv.sh\". On Windows use \"igv.bat\".\n", "\n", "## Load Files\n", "Once IGV is running do the following within IGV:\n", "1. **Genome Sequence:** Genomes->Load Genome From File: (select the FASTA file we just downloaded from Jupyter)\n", "2. **Annotation:** File->Load From File: (select the GTF file we just downloaded from Jupyter)\n", "3. **Bamfile:** File->Load From File: (select the BAM files we just downloaded from Jupyter)\n", "\n", "## Configurations to explore\n", "1. Zoom in until reads are visible\n", "3. Right click -> Color alignments by -> first-of-pair strand\n", "4. Right click->Collapsed\n", "\n", "\n", "\n", "## Look around\n", "A few things to look for:\n", "- read strand relative to annotated gene strand\n", "- intron-spanning reads\n", "- SNPs\n", "- Areas with no reads\n", "- Coverage depth plot\n", "- antisense reads\n", "- non-protein-coding RNAs" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 1 }