{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Unix Shell: Writing Shell Scripts\n",
    "\n",
    "The shell commands constitute a programming language, and command line programs known as shell scripts can be written to perform complex tasks. \n",
    "\n",
    "This will only provide a brief overview - shell scripts have many traps and pitfalls for the unwary, and we generally prefer to use languages such as Python or R with more consistent syntax for complex tasks. However, shell scripts are extensively used in domains such as the preprocessing of genomics data, and it is a useful tool to know about."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Assigning variables\n",
    "\n",
    "We assign variables using `=` and recall them by using `$`. It is customary to spell shell variable names in ALL_CAPS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Joe\n",
      "Hello Joe\n"
     ]
    }
   ],
   "source": [
    "NAME='Joe'\n",
    "echo \"Hello $NAME\"\n",
    "echo \"Hello ${NAME}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single and double parentheses\n",
    "\n",
    "The main difference between the use of '' and \"\" is that variable expansion only occurs with double parentheses. For plain text, they are equivalent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 222,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "${NAME}\n"
     ]
    }
   ],
   "source": [
    "echo '${NAME}'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 223,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Joe\n"
     ]
    }
   ],
   "source": [
    "echo \"${NAME}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use of parenthesis\n",
    "\n",
    "Use of parenthesis unambiguously specifies the variable of interest. I suggest you always use them as a defensive programming technique."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Joel\n"
     ]
    }
   ],
   "source": [
    "echo \"Hello ${NAME}l\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$Namel is not defined, and so returns an empty string!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello \n"
     ]
    }
   ],
   "source": [
    "echo \"Hello $NAMEl\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the quirks of shell scripts is already present - there cannot be spaces before or after the `=` in an assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bash: Joe: command not found\n",
      "Hello \n"
     ]
    }
   ],
   "source": [
    "NAME2= 'Joe'\n",
    "echo \"Hello ${NAME2}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bash: NAME3: command not found\n",
      "Hello \n"
     ]
    }
   ],
   "source": [
    "NAME3 ='Joe'\n",
    "echo \"Hello ${NAME3}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Assigning commands to variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/cliburn/_teach/bios-821/lessons\n"
     ]
    }
   ],
   "source": [
    "pwd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/cliburn/_teach/bios-821\n",
      "lessons\n"
     ]
    }
   ],
   "source": [
    "CUR_DIR=$(pwd)\n",
    "dirname ${CUR_DIR}\n",
    "basename ${CUR_DIR}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with numbers\n",
    "\n",
    "**Careful**: Note the use of DOUBLE parentheses to trigger evaluation of a mathematical expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 226,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n"
     ]
    }
   ],
   "source": [
    "NUM=$((1+2+3+4))\n",
    "echo ${NUM}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `seq` generates a range of numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "2\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "seq 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      "3\n",
      "4\n",
      "5\n"
     ]
    }
   ],
   "source": [
    "seq 2 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n",
      "7\n",
      "9\n"
     ]
    }
   ],
   "source": [
    "seq 5 2 9"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Attempt to find sum of first 5 numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "26288 1\n"
     ]
    }
   ],
   "source": [
    "seq 5 | sum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cksum(1), sum(1)         - display file checksums and block counts\n",
      "sum(n)                   - Calculate a sum(1) compatible checksum\n"
     ]
    }
   ],
   "source": [
    "whatis sum "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Doing this is surprisingly tricky\n",
    "\n",
    "Another reason to use Python or R where sum is just sum."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first use command substitution to treat the output of `seq 5` as a file, then pass it to `paste` which inserts teh `+` delimiter between each line in the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "paste(1)                 - merge corresponding or subsequent lines of files\n"
     ]
    }
   ],
   "source": [
    "whatis paste"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1+2+3+4+5\n"
     ]
    }
   ],
   "source": [
    "paste -s -d+ <(seq 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This string is then passed to `bc`, which evaluates strings as mathematical expressions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bc(1)                    - An arbitrary precision calculator language\n"
     ]
    }
   ],
   "source": [
    "whatis bc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "15\n"
     ]
    }
   ],
   "source": [
    "paste -s -d+ <(seq 5) | bc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing functions\n",
    "\n",
    "Functions in bash have this structure\n",
    "\n",
    "```bash\n",
    "function_name () {\n",
    "    commands\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Function arguments\n",
    "\n",
    "Function arguments are retrieved via special symbols known as positional parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "f () {\n",
    "    echo $0\n",
    "    echo $1\n",
    "    echo $2\n",
    "    echo $@\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/bin/bash\n",
      "one\n",
      "two\n",
      "one two three\n"
     ]
    }
   ],
   "source": [
    "f one two three"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Writing a sum function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "f_sum () {\n",
    "    paste -s -d+ $1 | bc\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "15\n"
     ]
    }
   ],
   "source": [
    "f_sum <(seq 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Function to extract first line of a set of files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello.txt\tstderr.txt\tstdout.txt\ttest1.txt\n"
     ]
    }
   ],
   "source": [
    "ls *.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "headers () {\n",
    "    for FILE in $@; do\n",
    "        echo -n \"${FILE}:   \"\n",
    "        cat ${FILE} | head -n 1\n",
    "    done\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello.txt:   1 Hello, bash\n",
      "stderr.txt:   mkdir: foo/bar: No such file or directory\n",
      "stdout.txt:   test1.txt:   One, two buckle my shoe\n"
     ]
    }
   ],
   "source": [
    "headers $(ls *.txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Branching"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using if to check for file existence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 Hello, bash\n",
      "2 Hello, again\n",
      "3 Hello\n",
      "4 again\n"
     ]
    }
   ],
   "source": [
    "if [ -f hello.txt ]; then\n",
    "    cat hello.txt\n",
    "else\n",
    "    echo \"No such file\"\n",
    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Downloading remote files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WGET(1)                            GNU Wget                            WGET(1)\n",
      "\n",
      "\n",
      "\n",
      "NAME\n",
      "       Wget - The non-interactive network downloader.\n",
      "\n",
      "SYNOPSIS\n",
      "       wget [option]... [URL]...\n",
      "\n",
      "DESCRIPTION\n",
      "       GNU Wget is a free utility for non-interactive download of files from\n",
      "       the Web.  It supports HTTP, HTTPS, and FTP protocols, as well as\n",
      "       retrieval through HTTP proxies.\n",
      "\n",
      "       Wget is non-interactive, meaning that it can work in the background,\n",
      "       while the user is not logged on.  This allows you to start a retrieval\n",
      "       and disconnect from the system, letting Wget finish the work.  By\n",
      "       contrast, most of the Web browsers require constant user's presence,\n",
      "       which can be a great hindrance when transferring a lot of data.\n"
     ]
    }
   ],
   "source": [
    "man wget | head -n 20"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A data frame with 2000 observations on the following 8 variables.\n",
      "  rank\n",
      "      the ranking of the company.\n",
      "  name\n",
      "      the name of the company.\n",
      "  country\n",
      "      a factor giving the country the company is situated in.\n",
      "  category\n",
      "      a factor describing the products the company produces.\n",
      "  sales\n",
      "      the amount of sales of the company in billion USD.\n",
      "  profits\n",
      "      the profit of the company in billion USD.\n",
      "  assets\n",
      "      the assets of the company in billion USD.\n",
      "  marketvalue\n",
      "      the market value of the company in billion USD.\n"
     ]
    }
   ],
   "source": [
    "wget -qO- https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/Forbes2000.html \\\n",
    "    | html2text | head -n 27  | tail -n 17"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "if [ ! -f \"data/forbes.csv\" ]; then\n",
    "    wget https://vincentarelbundock.github.io/Rdatasets/csv/HSAUR/Forbes2000.csv \\\n",
    "    -O data/forbes.csv\n",
    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conditional evaluation with `test`\n",
    "\n",
    "The `[ -f hello.txt ]` syntax is equivalent to `test -f hello.txt`, where `test` is a shell command with a large range of operators and flags that you can view in the man page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 209,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "TEST(1)                   BSD General Commands Manual                  TEST(1)\n",
      "\n",
      "NAME\n",
      "     test, [ -- condition evaluation utility\n",
      "\n",
      "SYNOPSIS\n",
      "     test expression\n",
      "     [ expression ]\n",
      "\n",
      "DESCRIPTION\n",
      "     The test utility evaluates the expression and, if it evaluates to true,\n",
      "     returns a zero (true) exit status; otherwise it returns 1 (false).  If\n",
      "     there is no expression, test also returns 1 (false).\n",
      "\n",
      "     All operators and flags are separate arguments to the test utility.\n",
      "\n",
      "     The following primaries are used to construct expression:\n",
      "\n",
      "     -b file       True if file exists and is a block special file.\n",
      "\n",
      "     -c file       True if file exists and is a character special file.\n",
      "\n",
      "     -d file       True if file exists and is a directory.\n",
      "\n",
      "     -e file       True if file exists (regardless of type).\n",
      "\n",
      "     -f file       True if file exists and is a regular file.\n",
      "\n",
      "     -g file       True if file exists and its set group ID flag is set.\n"
     ]
    }
   ],
   "source": [
    "man test | head -n 30"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Looping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### For loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello.txt\n",
      "stderr.txt\n",
      "stdout.txt\n",
      "test1.txt\n"
     ]
    }
   ],
   "source": [
    "for FILE in $(ls *txt); do\n",
    "    echo $FILE\n",
    "done"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### While loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 221,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "9\n",
      "8\n",
      "7\n",
      "6\n",
      "5\n",
      "4\n",
      "3\n",
      "2\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "COUNTER=10\n",
    "while [ $COUNTER -gt 0 ]; do\n",
    "    echo $COUNTER\n",
    "    COUNTER=$(($COUNTER - 1))\n",
    "done"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Careful**: Note that `<` is the redirection operator, and hence will lead to an infinite loop. Use `-lt` for less than and `-gt` for greater than,  `==` for equality and `!=` for inequality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 225,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "9\n",
      "8\n",
      "7\n",
      "6\n",
      "5\n",
      "4\n",
      "3\n",
      "2\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "COUNTER=10\n",
    "while [ $COUNTER != 0 ]; do\n",
    "    echo $COUNTER\n",
    "    COUNTER=$(($COUNTER - 1))\n",
    "done"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Shell script"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From now on, we will write the shell script using an editor for convenience. For a syntax-highlighted display, I use a non-standard Python program `pygmentize` that you can install with \n",
    "\n",
    "```\n",
    "pip install pygments\n",
    "```\n",
    "\n",
    "but you can also just use `cat` to display the file contents.\n",
    "\n",
    "A shell script is traditionally given the extension `.sh`. There are a few things to note:\n",
    "\n",
    "1. To make the script standalone, you need to add `#!/path/to/shell` in the first line. Otherwise you need to call the script with `bash /path/to/script` instead of just `/path/to/script`.\n",
    "2. To make the script executable, change the file permissions to executable with `chmod +x /path/to/script`\n",
    "3. Shell arguments are similar to  function arguments - i.e. `$1`, `$2`, `$@` etc. Another useful variable is `$#` which gives the number of command line arguments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Find default shell to use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/bin/bash\n"
     ]
    }
   ],
   "source": [
    "which bash"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Display script"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 199,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[37m#!/bin/bash\u001b[39;49;00m\n",
      "\n",
      "\u001b[34mif\u001b[39;49;00m [ -f \u001b[31m$1\u001b[39;49;00m ]; \u001b[34mthen\u001b[39;49;00m\n",
      "    cat \u001b[31m$1\u001b[39;49;00m\n",
      "\u001b[34melse\u001b[39;49;00m\n",
      "    \u001b[36mecho\u001b[39;49;00m \u001b[33m\"\u001b[39;49;00m\u001b[33mNo such file: \u001b[39;49;00m\u001b[31m$1\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\n",
      "\u001b[34mfi\u001b[39;49;00m\n"
     ]
    }
   ],
   "source": [
    "pygmentize -g scripts/cat_if_exists.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Give executable permission"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 200,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "chmod +x scripts/cat_if_exists.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 202,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 Hello, bash\n",
      "2 Hello, again\n",
      "3 Hello\n",
      "4 again\n"
     ]
    }
   ],
   "source": [
    "scripts/cat_if_exists.sh hello.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 204,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No such file: goodbye.txt\n"
     ]
    }
   ],
   "source": [
    "scripts/cat_if_exists.sh goodbye.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading a file line by line\n",
    "\n",
    "We will write a script to extract headers from a FASTA Nucleic Acid (FNA) file. Headers in FASTA format are lines that begin with the `>` character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 235,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">random sequence 1 consisting of 1000 bases.\n",
      "acggacaaacggttgatgtggttcttcgcaggatgcgccaaagtgtttacaaggctggta\n",
      "aactgagaatgtgcttgttccccgtctcacgcaaagatatgaggcgtaagagaccgacat\n",
      "attccctcctccataggtctttttgattattgatcactgcttcgccacccttagcgtggt\n",
      "gtctttcatagtctcaccgttaaacggcgacgttcgtgaacctgctcagtccctaaactc\n",
      "gataacaatcgggctgtgttggaagctagtattatcggcattcaggtagtagtcccccgg\n",
      "actagcacggtccgggtctggttgcacatacatggtagcgaaattccgctcctccagccc\n",
      "agaataaaggtagaagaccaatgcccgggtaaaaaactcaacgagtaggtcccacgatta\n",
      "tctgagtggtgaactatgctgaggacgacaatatcatcggagtgttcactagggtgcggg\n",
      "gttgactataagtgtagtctgatcatagagactccgcatattcggctacgctctataact\n",
      "aatttgacgaatgctgcgaacgcacctgcgtatcgcttccttctaacctcaggcggtcat\n",
      "tatcatgtcaaacaacaagagtaggtttatggcatcgacacgcatgactgcgtaacgagt\n",
      "cacacgccagacgtctaagcagtgcaatgccagcgtctatgaagctcttaattagcgggt\n",
      "ttacacttgcattgagtgaaatgtgccaagagcctactacaacccgcagccggcatatgg\n",
      "gatcaagcgaggcaatttgatgcgcccccaaagcacgcgaaaaaagagcttggacccgga\n",
      "agaaaacgatgttctgggtccgtcaagcctgcgtacagcttatccaacttttaagtggac\n",
      "gtgtccgcagacaagcacacagggagggctcgccaaaaaaattgctgtatctagtacaag\n",
      "gtagctaatagctccggaccgaccacctttccggactgcc\n",
      "\n",
      ">random sequence 2 consisting of 1000 bases.\n",
      "tgcgcattctcctatacatatgacgatctggtaccatgcgatagcggtcgccgagataat\n",
      "ataccaaaagacatatgtcttctccgcaccctgttcctcctaccagccacaggctctgca\n",
      "gcctctctcactccccgatcgagaaagattgggggttaacaataacactttttacgtcgg\n"
     ]
    }
   ],
   "source": [
    "cat data/example.fna | head -n 23"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 232,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[37m#!/bin/bash\u001b[39;49;00m\n",
      "\n",
      "\u001b[34mwhile\u001b[39;49;00m \u001b[36mread\u001b[39;49;00m LINE\n",
      "  \u001b[34mdo\u001b[39;49;00m\n",
      "      \u001b[34mif\u001b[39;49;00m [ \u001b[33m\"\u001b[39;49;00m\u001b[33m${\u001b[39;49;00m\u001b[31mLINE\u001b[39;49;00m:\u001b[31m0\u001b[39;49;00m:\u001b[31m1\u001b[39;49;00m\u001b[33m}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m == \u001b[33m'>'\u001b[39;49;00m ]; \u001b[34mthen\u001b[39;49;00m\n",
      "          \u001b[36mecho\u001b[39;49;00m \u001b[31m$LINE\u001b[39;49;00m\n",
      "      \u001b[34mfi\u001b[39;49;00m\n",
      "  \u001b[34mdone\u001b[39;49;00m \n"
     ]
    }
   ],
   "source": [
    "pygmentize scripts/extract_headers.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Careful**: You need to put all variables in the test condition within double quotes. If not, when the variable is empty or undefined (e.g. empty line) it vanishes and leaves `[ == '>' ]` which raises a syntax error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 227,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "chmod +x scripts/extract_headers.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 231,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">random sequence 1 consisting of 1000 bases.\n",
      ">random sequence 2 consisting of 1000 bases.\n",
      ">random sequence 3 consisting of 1000 bases.\n",
      ">random sequence 4 consisting of 1000 bases.\n",
      ">random sequence 5 consisting of 1000 bases.\n",
      ">random sequence 6 consisting of 1000 bases.\n",
      ">random sequence 7 consisting of 1000 bases.\n",
      ">random sequence 8 consisting of 1000 bases.\n",
      ">random sequence 9 consisting of 1000 bases.\n",
      ">random sequence 10 consisting of 1000 bases.\n"
     ]
    }
   ],
   "source": [
    "cat data/example.fna | scripts/extract_headers.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Bash",
   "language": "bash",
   "name": "bash"
  },
  "language_info": {
   "codemirror_mode": "shell",
   "file_extension": ".sh",
   "mimetype": "text/x-sh",
   "name": "bash"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}