{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with the Unix Shell 2" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "**Exercise 1.1**. Download a compressed data archive from https://www.dropbox.com/s/vivut71p4bkurhw/data.tar.gz (10 points)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 1.2**. Regenerate the original data folder from `data.tar.gz`. Change directory into the data folder. List the files in the folder. (10 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 1.3**. Check if any files have been corrupted using the MDFSUM checksum file and note its `FILENAME`. Delete any corrupted files. (10 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 1.4**. Replace the corrupted file with a correct copy from https://www.dropbox.com/s/vf8qcoj07mcq7wn/FILENAME. You will need to replace `FILENAME` with the correct filename as noted earlier. Check that there are no more `md5sum` errors. Go back to the original directory. (10 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 2.1**. (20 points) Write a script `extract.sh` that extracts only the raw sequence letters and quality scores from a [FASTQ](GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCGTTCAGGGATACGACGTTTGTATTTTAAGAATCTGA) file. For example, if `test.fq` consists of\n", "\n", "```\n", "@071112_SLXA-EAS1_s_7:5:1:817:345\n", "GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC\n", "+071112_SLXA-EAS1_s_7:5:1:817:345\n", "IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC\n", "@071112_SLXA-EAS1_s_7:5:1:801:338\n", "GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGA\n", "+071112_SLXA-EAS1_s_7:5:1:801:338\n", "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBI\n", "```\n", "\n", "then\n", "\n", "```bash\n", "cat test.fq | extract.sh \n", "```\n", "\n", "should print \n", "\n", "```\n", "GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC\n", "IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC\n", "GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGA\n", "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBI\n", "```\n", "\n", "For all of Exercise 2, create test.fq, and run your script on test.fq to check that it gets the expected results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "**Exercise 2.2** (20 points) Write a script `extract_seq.sh` that extracts only the raw sequence letters from a FASTQ file and combines them into a single string. For example,\n", "\n", "```\n", "cat test.fq | bash extract_seq.sh\n", "```\n", "\n", "should print\n", "\n", "```\n", "GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCGTTCAGGGATACGACGTTTGTATTTTAAGAATCTGA\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 2.3** Write a script `calc_gc.sh` that estimates the GC ratio (the fraction of all bases that are either G or C) from a FASTQ file. For example, the GC ratio of test.fq is .5138, (20 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 2 }