{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reference Genome and Annotation\n",
    "\n",
    "* [NCBI Pseudomonas syringae Genome Page](https://www.ncbi.nlm.nih.gov/genome/?term=DC3000)\n",
    "* FASTA: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/805/GCF_000007805.1_ASM780v1/GCF_000007805.1_ASM780v1_genomic.fna.gz\n",
    "* GFF: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/805/GCF_000007805.1_ASM780v1/GCF_000007805.1_ASM780v1_genomic.gff.gz\n",
    "\n",
    "The following can be used to download the reference genome sequence and annotation for our strain of *Pseudomonas syringae*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ACCESSION=\"GCF_000007805.1_ASM780v1\"\n",
    "PREFIX=${ACCESSION}_genomic\n",
    "GFF=${PREFIX}.gff\n",
    "FNA=${PREFIX}.fna\n",
    "FA=${PREFIX}.fa\n",
    "GENOME_DIR=XXXXXXXXX\n",
    "\n",
    "FIRST_PART=\"ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/805\"\n",
    "for CUR in $GFF $FNA ; do\n",
    "    rsync rsync://${FIRST_PART}/${ACCESSION}/${CUR}.gz ${GENOME_DIR}\n",
    "done"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Adapter\n",
    "The universal adapter and the 3' common portion of the indexed adapter that we used for this year's project are the same as for the 2015 data, so you can use the same adapter file.  We have provided an adapter file with these sequences in `/home/jovyan/work/2017-HTS-materials/Data_Info_and_Results/2017_HTS/info/neb_adapters.fasta`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Overview\n",
    "## Raw Data\n",
    "We have 4 dataset this year, each of these is a subdirectory in `/data/HTS_2017_data/raw_data/`:\n",
    "* HTS_2017_pilot: One MiSeq run generated from pool of 6 pilot samples\n",
    "* HTS_2017_miseq_1: One MiSeq run generated from pool of all 48 samples (8 groups x 6 samples per group)\n",
    "* HTS_2017_miseq_2: Second MiSeq from same pool of 48 samples used in HTS_2017_miseq_1\n",
    "* HTS_2017_nextseq:  NextSeq run on pool of 42 samples\n",
    "\n",
    "### Manifest\n",
    "A manifest for the FASTQ files is available at: \n",
    "`/home/jovyan/work/2017-HTS-materials/Data_Info_and_Results/2017_HTS/info/fastq_manifest.csv`\n",
    "\n",
    "### Notes\n",
    "1. NextSeq run does not include samples 31-36\n",
    "2. NextSeq has 4 contiguous lanes, so each sample has four FASTQs, one each L001-L004\n",
    "3. NextSeq data is 75bp single-end reads, so it is not directly comparable to the MiSeq data, which is 50bp single-end\n",
    "\n",
    "## Count Data\n",
    "We are making available pre-generated count data from all samples and sequencing runs for anyone who wants to do comparisons between groups or between runs.  The count data are in subdirectories of                                                                             \n",
    "`/home/jovyan/work/2017-HTS-materials/Data_Info_and_Results/2017_HTS/counts/`:\n",
    "* HTS_2017_pilot: Counts from HTS_2017_pilot MiSeq run\n",
    "* HTS_2017_miseq_1: Counts from HTS_2017_miseq_1 MiSeq run\n",
    "* HTS_2017_both_miseq: Counts generated from concatenation of HTS_2017_miseq_1 and HTS_2017_miseq_2 runs for each sample\n",
    "* HTS_2017_nextseq: Counts generated from concatenation of all four NextSeq lanes for each sample\n",
    "\n",
    "## Metadata\n",
    "A metadata table describing each sample is at                                                                      \n",
    "`/home/jovyan/work/2017-HTS-materials/Data_Info_and_Results/2017_HTS/info/full_metadata.csv`\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Bash",
   "language": "bash",
   "name": "bash"
  },
  "language_info": {
   "codemirror_mode": "shell",
   "file_extension": ".sh",
   "mimetype": "text/x-sh",
   "name": "bash"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}