{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The Unix Shell: Working with Text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Text editors\n", "\n", "Usually we create text using a text editor. Standard text editors are `vi` and `emacs`, although you can also use an alternative from this [list](https://en.wikipedia.org/wiki/List_of_text_editors). I personally use [Atom](https://atom.io).\n", "\n", "### `vi`\n", "\n", "Regardless of what text editor you choose as your primary tool, it is essential to have at least some experience using `vi`, because that is available on ALL Unix systems, and may be the only editor available on a remote server. Work through this [tutorial](http://www.openvim.com). \n", "\n", "### `emacs`\n", "\n", "Most unix systems will also have `emacs` installed. It is therefore also worth trying out the built-in `emacs` tutorial. start Emacs (`emacs`) and type C-h t, that is, Ctrl-h followed by t to access the tutorial.\n", "\n", "Note that people who love `vi` often hate `emacs` and vice versa ;-)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Text streams\n", "\n", "Input and output of most Unix shell programs consists of plain text streams. Text output from a program can be piped into another program, or redirected to other streams. The standard streams are `stdin (0)` (standard input), `stdout (1)` (standard output) and `stderr (2)` (standard error). The default is to assume that input comes from `stdin` and output goes to `stdout`. We can also stream to and from a file. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pipes and redirection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a text file from command line\n", "\n", "Sometimes using a text editor is over-kill. For simple file creation, we can just use re-direction\n", "\n", "A single `>` will create a new file or over-write an existing one." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "echo \"1 Hello, bash\" > hello.txt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt\tstderr.txt\tstdout.txt\n" ] } ], "source": [ "ls *txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Appending" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "echo \"2 Hello, again\" >> hello.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Special non-printing characters" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "echo -e \"3 Hello\\n4 again\" >> hello.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### From file to `stdout" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 Hello, bash\n", "2 Hello, again\n", "3 Hello\n", "4 again\n" ] } ], "source": [ "cat hello.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Here docs\n", "\n", "We can also create multi-line text streams with `here docs`. Here docs are strted with `<<` and delimited by some arbitrary string at the beginning and end." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cat > test1.txt << EOF\n", "One, two buckle my shoe\n", "Three, four lock the door\n", "EOF" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "One, two buckle my shoe\n", "Three, four lock the door\n" ] } ], "source": [ "cat test1.txt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Onx, twx bxcklx my shxx\n", "Thrxx, fxxr lxck thx dxxr\n" ] } ], "source": [ "tr aeiou xxxxx << SOME_DELIMITER\n", "One, two buckle my shoe\n", "Three, four lock the door\n", "SOME_DELIMITER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pipe to `cut` program to extract columns 2,3,4,5" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Hel\n", " Hel\n", " Hel\n", " aga\n" ] } ], "source": [ "cat hello.txt | cut -c 2-5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building a chain of pipes\n", "\n", "`wc -lc` reports the number of lines and bytes (usually corresponds to characters when using English text). \n", "\n", "Note that character count is 5 per line and not 4 because cut adds a newline character for each line." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4 20\n" ] } ], "source": [ "cat hello.txt | cut -c 2-5 | wc -lc " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Capturing error messages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The redirection operator `>` is actually `1>` - that is, using `stdout`. We can also use `2>` to redirect the output of `stderr`. `&>` means redirect both `stdout` and `stderr`, and is useful if for example, you want to direct all output to the same log file for later inspection." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mkdir: foo/bar: No such file or directory\n" ] } ], "source": [ "mkdir foo/bar/baz > 'stdout.txt' | cat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### As there is notheing from `stdout` the file is emtpy" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cat 'stdout.txt'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We need to use `2>` to capture the output from `stderr`" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mkdir foo/bar/baz 2> 'stderr.txt' | cat" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mkdir: foo/bar: No such file or directory\n" ] } ], "source": [ "cat 'stderr.txt' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example - getting the 2nd and 3rd lines of `hello.txt`" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2 Hello, again\n", "3 Hello\n" ] } ], "source": [ "cat hello.txt | head -n 3 | tail -n 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Character substitution with `tr` (transliteration)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Switch case." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tHIS IS dUKE\n" ] } ], "source": [ "echo \"This is Duke\" | tr a-zA-Z A-Za-z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Find reverse complement of DNA string." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TGTAATC\n" ] } ], "source": [ "echo 'GATTACA' | tr ACTG TGAC | rev" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Caesar cipher encoding and decoding" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vjku ku Fwmg\n" ] } ], "source": [ "echo \"This is Duke\" | tr a-zA-Z c-zabC-ZAB" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is Duke\n" ] } ], "source": [ "echo \"Vjku ku Fwmg\" | tr c-zabC-ZAB a-zA-Z " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Clean up" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rm *txt" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 2 }