{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unix Text and Arithmetic Quiz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1** Use a here document with `cat` to create a a multi-line file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cat > baa.txt <<'EOF'\n",
    "baa, baa black sheep\n",
    "have you any wool?\n",
    "EOF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "baa, baa black sheep\n",
      "have you any wool?\n"
     ]
    }
   ],
   "source": [
    "cat baa.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2**. What is the difference between a plain `label` and a quoted `'label'` in a here document?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cat > plain.txt <<EOF\n",
    "echo $SHELL\n",
    "EOF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "echo /bin/bash\n"
     ]
    }
   ],
   "source": [
    "cat plain.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cat > quoted.txt <<'EOF'\n",
    "echo $SHELL\n",
    "EOF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "echo $SHELL\n"
     ]
    }
   ],
   "source": [
    "cat quoted.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3**. How would you run a bash command that sends its output and errors to separate files?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "1",
     "output_type": "error",
     "traceback": []
    }
   ],
   "source": [
    "rmdir foo > output.txt 2> error.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cat output.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "rmdir: foo: No such file or directory\n"
     ]
    }
   ],
   "source": [
    "cat error.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4**. How would you run a bash command that sends its output and errors to the same file? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "1",
     "output_type": "error",
     "traceback": []
    }
   ],
   "source": [
    "rmdir foo &> combined.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "rmdir: foo: No such file or directory\n"
     ]
    }
   ],
   "source": [
    "cat combined.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**5**. Sort the numbers `1 10 100 2 20 200` in decreasing order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "200\n",
      "100\n",
      "20\n",
      "10\n",
      "2\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "echo 1 10 100 2 20 200 | tr ' ' '\\n'  | sort -nr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**6**. Download the file iris.csv from the URL `https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv`. Extract the column headers and sort them in reverse order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "wget -q https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "species\n",
      "sepal_width\n",
      "sepal_length\n",
      "petal_width\n",
      "petal_length\n"
     ]
    }
   ],
   "source": [
    "head -1 iris.csv | tr ',' '\\n' | sort -r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**7**. Find the unique species in the downloaded `iris.csv` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "setosa\n",
      "versicolor\n",
      "virginica\n"
     ]
    }
   ],
   "source": [
    "tail +2 iris.csv  | cut -f 5 -d',' | sort | uniq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**8**. How would you list all files that end in `.o`, `.c` or `.h`?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "touch foo.o foo.c foo.h foo.a foo.b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "foo.c\tfoo.h\tfoo.o\n"
     ]
    }
   ],
   "source": [
    "ls *.{o,c,h}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**9**. Create `a`, `b` and `c`  as sub-directories of the directory given by `pwd` using brace expansion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "mkdir ${PWD}/{a,b,c}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a/\tb/\tc/\n"
     ]
    }
   ],
   "source": [
    "ls -d */"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regular expressions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**10**. Do the first 3 exercises of [Regeex Golf](https://alf.nu/RegexGolf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Left as an exercise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**11**. Find a regular expression to match only valid outputs (as much as possible) of `date +%d-%b-%y`.\n",
    "\n",
    "For example, the following should not match\n",
    "\n",
    "```\n",
    "05-Mud-17\n",
    "5-Mar-17\n",
    "05-Mar-2017\n",
    "56-Mar-17\n",
    "```\n",
    "\n",
    "But we will allow `mistakes` like\n",
    "\n",
    "```\n",
    "30-Feb-18\n",
    "37-Mar-00\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30-Feb-18\n",
      "37-Mar-00\n"
     ]
    }
   ],
   "source": [
    "for s in \"05-Mud-17\" \"5-Mar-17\" \"05-Mar-2017\" \"56-Mar-17\" \"30-Feb-18\" \"37-Mar-00\"; do\n",
    "    echo $s | grep -E '^[0-3][0-9]-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{2}$'\n",
    "done"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finding stuff"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**12**.  Find a regular expression to find the repeated character runs in DNA of length at least 4. USe the `-oE` arguments to `grep` to only show captured groups. For example, the string\n",
    "\n",
    "```\n",
    "AAATTTTAAAACAAAGCGCGCGCATGC\n",
    "```\n",
    "\n",
    "should find\n",
    "```\n",
    "TTTT\n",
    "AAAA\n",
    "GCGCGCGC\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TTTT\n",
      "AAAA\n",
      "GCGCGCGC\n"
     ]
    }
   ],
   "source": [
    "echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | grep -oE '(.)\\1{3,}|(.{2,})\\2{1,}'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**13**. Find all files greater than 3 KB that were created more than 1 day ago in the current directory, excluding anything in `.ipynb_checkpoints`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "./01_UnixBasics.ipynb\n",
      "./01_UnixBasics_Solutions.ipynb\n",
      "./EntryQuiz.ipynb\n",
      "./EntryQuiz_Solutions.ipynb\n"
     ]
    }
   ],
   "source": [
    "find . -ctime +1 -size +3k -not -path './.ipynb_checkpoints/*' "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**14**. Find all lines that contain the words `git` (case-insensitive) in files in the current directory created more the 3 days ago , excluding anything in `.ipynb_checkpoints`. You should not find lines where `git` is part of a word like `GitLab`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "./EntryQuiz.ipynb:    \"- `git`\\n\",\n",
      "./EntryQuiz.ipynb:    \"- `git` => distributed version control system\\n\",\n",
      "./EntryQuiz_Solutions.ipynb:    \"- `git`\\n\",\n",
      "./EntryQuiz_Solutions.ipynb:    \"- `git` => distributed version control system\\n\",\n"
     ]
    }
   ],
   "source": [
    "find . -ctime +3 \\\n",
    "-not -path './.ipynb_checkpoints' \\\n",
    "-not -path './.ipynb_checkpoints/*' \\\n",
    "| xargs  grep -i '\\<git\\>'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modifying text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**15**. Find the reverse complement of `AAATTTTAAAACAAAGCGCGCGCATGC`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GCATGCGCGCGCTTTGTTTTAAAATTT\n"
     ]
    }
   ],
   "source": [
    "echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | tr 'ACTG' 'TGAC' | rev"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**16**. Print the first 10 rows whose species names is `versicolor` in `iris.csv`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7,3.2,4.7,1.4,versicolor\n",
      "6.4,3.2,4.5,1.5,versicolor\n",
      "6.9,3.1,4.9,1.5,versicolor\n",
      "5.5,2.3,4,1.3,versicolor\n",
      "6.5,2.8,4.6,1.5,versicolor\n",
      "5.7,2.8,4.5,1.3,versicolor\n",
      "6.3,3.3,4.7,1.6,versicolor\n",
      "4.9,2.4,3.3,1,versicolor\n",
      "6.6,2.9,4.6,1.3,versicolor\n",
      "5.2,2.7,3.9,1.4,versicolor\n"
     ]
    }
   ],
   "source": [
    "tail +2 iris.csv | sed -n '/versicolor/p' | head -10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**17**. Create a new file `iris1.csv` without a header row, and where each species name just uses the first 2 characters of the original."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [],
   "source": [
    "tail +2 iris.csv | \\\n",
    "sed -e 's/versicolor/ve/' -e 's/setosa/se/' -e 's/virginica/vi/' > iris1.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.1,3.5,1.4,0.2,se\n",
      "4.9,3,1.4,0.2,se\n",
      "4.7,3.2,1.3,0.2,se\n",
      "4.6,3.1,1.5,0.2,se\n",
      "5,3.6,1.4,0.2,se\n",
      "5.4,3.9,1.7,0.4,se\n",
      "4.6,3.4,1.4,0.3,se\n",
      "5,3.4,1.5,0.2,se\n",
      "4.4,2.9,1.4,0.2,se\n",
      "4.9,3.1,1.5,0.1,se\n"
     ]
    }
   ],
   "source": [
    "head iris1.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Arithmetic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**18**. Evaluate 1 + 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "echo $((1 + 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**19** Fidn the sum of all `petal_length` values in `iris.csv`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sepal_length,sepal_width,petal_length,petal_width,species\n"
     ]
    }
   ],
   "source": [
    "head -1 iris.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "563.8\n"
     ]
    }
   ],
   "source": [
    "tail +2 iris.csv | cut -f3 -d','  | paste -s -d+ - | bc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**20**. Delete any file or directory created in this session."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "rmdir a b c \n",
    "rm foo*\n",
    "rm iris* baa.txt plain.txt quoted.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Bash",
   "language": "bash",
   "name": "bash"
  },
  "language_info": {
   "codemirror_mode": "shell",
   "file_extension": ".sh",
   "mimetype": "text/x-sh",
   "name": "bash"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}