{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Unix Shell: Working with Text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text editors\n",
    "\n",
    "Usually we create text using a text editor. Standard text editors are `vi` and `emacs`, although you can also use an alternative from this [list](https://en.wikipedia.org/wiki/List_of_text_editors). I personally use [Atom](https://atom.io).\n",
    "\n",
    "### `vi`\n",
    "\n",
    "Regardless of what text editor you choose as your primary tool, it is essential to have at least some experience using `vi`, because that is available on ALL Unix systems, and may be the only editor available on a remote server. Work through this [tutorial](http://www.openvim.com). \n",
    "\n",
    "### `emacs`\n",
    "\n",
    "Most unix systems will also have `emacs` installed. It is therefore also worth trying out the built-in `emacs` tutorial. start Emacs (`emacs`) and type C-h t, that is, Ctrl-h followed by t to access the tutorial.\n",
    "\n",
    "Note that people who love `vi` often hate `emacs` and vice versa ;-)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text streams\n",
    "\n",
    "Input and output of most Unix shell programs consists of plain text streams. Text output from a program can be piped into another program, or redirected to other streams. The standard streams are `stdin (0)` (standard input), `stdout (1)` (standard output) and `stderr (2)` (standard error). The default is to assume that input comes from `stdin` and output goes to `stdout`. We can also stream to and from a file. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pipes and redirection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a text file from command line\n",
    "\n",
    "Sometimes using a text editor is over-kill. For simple file creation, we can just use re-direction\n",
    "\n",
    "A single `>` will create a new file or over-write an existing one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "echo \"1 Hello, bash\" > hello.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello.txt\tstderr.txt\tstdout.txt\n"
     ]
    }
   ],
   "source": [
    "ls *txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Appending"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "echo \"2 Hello, again\" >> hello.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Special non-printing characters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "echo -e \"3 Hello\\n4 again\" >> hello.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### From file to `stdout"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 Hello, bash\n",
      "2 Hello, again\n",
      "3 Hello\n",
      "4 again\n"
     ]
    }
   ],
   "source": [
    "cat hello.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Here docs\n",
    "\n",
    "We can also create multi-line text streams with `here docs`. Here docs are strted with `<<` and delimited by some arbitrary string at the beginning and end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cat > test1.txt << EOF\n",
    "One, two buckle my shoe\n",
    "Three, four lock the door\n",
    "EOF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "One, two buckle my shoe\n",
      "Three, four lock the door\n"
     ]
    }
   ],
   "source": [
    "cat test1.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Onx, twx bxcklx my shxx\n",
      "Thrxx, fxxr lxck thx dxxr\n"
     ]
    }
   ],
   "source": [
    "tr aeiou xxxxx << SOME_DELIMITER\n",
    "One, two buckle my shoe\n",
    "Three, four lock the door\n",
    "SOME_DELIMITER"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pipe to `cut` program to extract columns 2,3,4,5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Hel\n",
      " Hel\n",
      " Hel\n",
      " aga\n"
     ]
    }
   ],
   "source": [
    "cat hello.txt | cut -c 2-5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building a chain of pipes\n",
    "\n",
    "`wc -lc` reports the number of lines and bytes (usually corresponds to characters when using English text). \n",
    "\n",
    "Note that character count is 5 per line and not 4 because cut adds a newline character for each line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       4      20\n"
     ]
    }
   ],
   "source": [
    "cat hello.txt | cut -c 2-5 | wc -lc  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Capturing error messages"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The redirection operator `>` is actually `1>` - that is, using `stdout`. We can also use `2>` to redirect the output of `stderr`. `&>` means redirect both `stdout` and `stderr`, and is useful if for example, you want to direct all output to the same log file for later inspection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: foo/bar: No such file or directory\n"
     ]
    }
   ],
   "source": [
    "mkdir foo/bar/baz > 'stdout.txt' | cat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### As there is notheing from `stdout` the file is emtpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cat 'stdout.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We need to use `2>` to capture the output from `stderr`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "mkdir foo/bar/baz 2> 'stderr.txt' | cat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: foo/bar: No such file or directory\n"
     ]
    }
   ],
   "source": [
    "cat 'stderr.txt' "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example - getting the 2nd and 3rd lines of `hello.txt`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2 Hello, again\n",
      "3 Hello\n"
     ]
    }
   ],
   "source": [
    "cat hello.txt | head -n 3 | tail -n 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Character substitution with `tr`  (transliteration)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Switch case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tHIS IS dUKE\n"
     ]
    }
   ],
   "source": [
    "echo \"This is Duke\" | tr a-zA-Z A-Za-z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Find reverse complement of DNA string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TGTAATC\n"
     ]
    }
   ],
   "source": [
    "echo 'GATTACA' | tr ACTG TGAC | rev"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Caesar cipher encoding and decoding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Vjku ku Fwmg\n"
     ]
    }
   ],
   "source": [
    "echo \"This is Duke\" | tr a-zA-Z c-zabC-ZAB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is Duke\n"
     ]
    }
   ],
   "source": [
    "echo \"Vjku ku Fwmg\" | tr c-zabC-ZAB a-zA-Z "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Clean up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "rm *txt"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Bash",
   "language": "bash",
   "name": "bash"
  },
  "language_info": {
   "codemirror_mode": "shell",
   "file_extension": ".sh",
   "mimetype": "text/x-sh",
   "name": "bash"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}