{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Unix Text and Arithmetic Quiz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1** Use a here document with `cat` to create a a multi-line file." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cat > baa.txt <<'EOF'\n", "baa, baa black sheep\n", "have you any wool?\n", "EOF" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "baa, baa black sheep\n", "have you any wool?\n" ] } ], "source": [ "cat baa.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2**. What is the difference between a plain `label` and a quoted `'label'` in a here document?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cat > plain.txt < quoted.txt <<'EOF'\n", "echo $SHELL\n", "EOF" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "echo $SHELL\n" ] } ], "source": [ "cat quoted.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3**. How would you run a bash command that sends its output and errors to separate files?" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "ename": "", "evalue": "1", "output_type": "error", "traceback": [] } ], "source": [ "rmdir foo > output.txt 2> error.txt" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cat output.txt" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rmdir: foo: No such file or directory\n" ] } ], "source": [ "cat error.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4**. How would you run a bash command that sends its output and errors to the same file? " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "ename": "", "evalue": "1", "output_type": "error", "traceback": [] } ], "source": [ "rmdir foo &> combined.txt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rmdir: foo: No such file or directory\n" ] } ], "source": [ "cat combined.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**5**. Sort the numbers `1 10 100 2 20 200` in decreasing order." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "200\n", "100\n", "20\n", "10\n", "2\n", "1\n" ] } ], "source": [ "echo 1 10 100 2 20 200 | tr ' ' '\\n' | sort -nr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**6**. Download the file iris.csv from the URL `https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv`. Extract the column headers and sort them in reverse order." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "wget -q https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "species\n", "sepal_width\n", "sepal_length\n", "petal_width\n", "petal_length\n" ] } ], "source": [ "head -1 iris.csv | tr ',' '\\n' | sort -r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**7**. Find the unique species in the downloaded `iris.csv` file." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "setosa\n", "versicolor\n", "virginica\n" ] } ], "source": [ "tail +2 iris.csv | cut -f 5 -d',' | sort | uniq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**8**. How would you list all files that end in `.o`, `.c` or `.h`?" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "touch foo.o foo.c foo.h foo.a foo.b" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "foo.c\tfoo.h\tfoo.o\n" ] } ], "source": [ "ls *.{o,c,h}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**9**. Create `a`, `b` and `c` as sub-directories of the directory given by `pwd` using brace expansion." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mkdir ${PWD}/{a,b,c}" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a/\tb/\tc/\n" ] } ], "source": [ "ls -d */" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regular expressions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**10**. Do the first 3 exercises of [Regeex Golf](https://alf.nu/RegexGolf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Left as an exercise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**11**. Find a regular expression to match only valid outputs (as much as possible) of `date +%d-%b-%y`.\n", "\n", "For example, the following should not match\n", "\n", "```\n", "05-Mud-17\n", "5-Mar-17\n", "05-Mar-2017\n", "56-Mar-17\n", "```\n", "\n", "But we will allow `mistakes` like\n", "\n", "```\n", "30-Feb-18\n", "37-Mar-00\n", "```" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "30-Feb-18\n", "37-Mar-00\n" ] } ], "source": [ "for s in \"05-Mud-17\" \"5-Mar-17\" \"05-Mar-2017\" \"56-Mar-17\" \"30-Feb-18\" \"37-Mar-00\"; do\n", " echo $s | grep -E '^[0-3][0-9]-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{2}$'\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding stuff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**12**. Find a regular expression to find the repeated character runs in DNA of length at least 4. USe the `-oE` arguments to `grep` to only show captured groups. For example, the string\n", "\n", "```\n", "AAATTTTAAAACAAAGCGCGCGCATGC\n", "```\n", "\n", "should find\n", "```\n", "TTTT\n", "AAAA\n", "GCGCGCGC\n", "```" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TTTT\n", "AAAA\n", "GCGCGCGC\n" ] } ], "source": [ "echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | grep -oE '(.)\\1{3,}|(.{2,})\\2{1,}'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**13**. Find all files greater than 3 KB that were created more than 1 day ago in the current directory, excluding anything in `.ipynb_checkpoints`." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./01_UnixBasics.ipynb\n", "./01_UnixBasics_Solutions.ipynb\n", "./EntryQuiz.ipynb\n", "./EntryQuiz_Solutions.ipynb\n" ] } ], "source": [ "find . -ctime +1 -size +3k -not -path './.ipynb_checkpoints/*' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**14**. Find all lines that contain the words `git` (case-insensitive) in files in the current directory created more the 3 days ago , excluding anything in `.ipynb_checkpoints`. You should not find lines where `git` is part of a word like `GitLab`." ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./EntryQuiz.ipynb: \"- `git`\\n\",\n", "./EntryQuiz.ipynb: \"- `git` => distributed version control system\\n\",\n", "./EntryQuiz_Solutions.ipynb: \"- `git`\\n\",\n", "./EntryQuiz_Solutions.ipynb: \"- `git` => distributed version control system\\n\",\n" ] } ], "source": [ "find . -ctime +3 \\\n", "-not -path './.ipynb_checkpoints' \\\n", "-not -path './.ipynb_checkpoints/*' \\\n", "| xargs grep -i '\\'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modifying text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**15**. Find the reverse complement of `AAATTTTAAAACAAAGCGCGCGCATGC`." ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GCATGCGCGCGCTTTGTTTTAAAATTT\n" ] } ], "source": [ "echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | tr 'ACTG' 'TGAC' | rev" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**16**. Print the first 10 rows whose species names is `versicolor` in `iris.csv`." ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7,3.2,4.7,1.4,versicolor\n", "6.4,3.2,4.5,1.5,versicolor\n", "6.9,3.1,4.9,1.5,versicolor\n", "5.5,2.3,4,1.3,versicolor\n", "6.5,2.8,4.6,1.5,versicolor\n", "5.7,2.8,4.5,1.3,versicolor\n", "6.3,3.3,4.7,1.6,versicolor\n", "4.9,2.4,3.3,1,versicolor\n", "6.6,2.9,4.6,1.3,versicolor\n", "5.2,2.7,3.9,1.4,versicolor\n" ] } ], "source": [ "tail +2 iris.csv | sed -n '/versicolor/p' | head -10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**17**. Create a new file `iris1.csv` without a header row, and where each species name just uses the first 2 characters of the original." ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [], "source": [ "tail +2 iris.csv | \\\n", "sed -e 's/versicolor/ve/' -e 's/setosa/se/' -e 's/virginica/vi/' > iris1.csv" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.1,3.5,1.4,0.2,se\n", "4.9,3,1.4,0.2,se\n", "4.7,3.2,1.3,0.2,se\n", "4.6,3.1,1.5,0.2,se\n", "5,3.6,1.4,0.2,se\n", "5.4,3.9,1.7,0.4,se\n", "4.6,3.4,1.4,0.3,se\n", "5,3.4,1.5,0.2,se\n", "4.4,2.9,1.4,0.2,se\n", "4.9,3.1,1.5,0.1,se\n" ] } ], "source": [ "head iris1.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arithmetic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**18**. Evaluate 1 + 1." ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n" ] } ], "source": [ "echo $((1 + 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**19** Fidn the sum of all `petal_length` values in `iris.csv`." ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sepal_length,sepal_width,petal_length,petal_width,species\n" ] } ], "source": [ "head -1 iris.csv" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "563.8\n" ] } ], "source": [ "tail +2 iris.csv | cut -f3 -d',' | paste -s -d+ - | bc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**20**. Delete any file or directory created in this session." ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rmdir a b c \n", "rm foo*\n", "rm iris* baa.txt plain.txt quoted.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 2 }