Introduction to Unix and bash

What is an operating system?

Most of you probably know that Unix (Linux), like Windows and MacOS, is an operating system (OS). An operating system is essentially the interface between applications (such as Word or your browser) and the computer itself (devices, memory, etc.)

We usually interact with the OS using a Graphical User Interface (GUI).

Those are the ‘windows’ that made microsoft into the giant it is today. We open windows into the OS (formally known as DOS). We point and click to find files, open programs, etc. This has the advantage that it is natural for the user. A disadvantage is that pointing and clicking isn’t exactly ‘reproducible’. We don’t have a record of everything we have done - and that is something we really want when analyzing data.

Another way - the command line

Unix, Windows and MacOS all have command line interfaces. We can access them via ‘terminal’ windows. The Jupyter environment offers two different ways to use the command line (in our case, we will use the bash shell - more on that later). There is a terminal window under the tab on the left hand side, but there is also a bash kernel, so that you can write commands in a notebook code cell and have them interpreted as unix commands. You will learn to use both in this course.

The unix command line interpreter is an interactive program

The interpreter parses the strings you enter and calls the appropriate executable program. There are different versions of this program, of which ‘bash’ is one. ‘sh’ was the original version, written in the 1970s by Ken Thompson. It was updated over the years and finally completely re-written in 1979 by Stephen Bourne. In 1989, another shell was built on the Bourne ‘sh’ - and it was named ‘bash’ - for Bourne Again sh. There are several other flavors, but we will be using bash.

sh? Why the hush?

sh is short for ‘shell’. The reason for this terminology is because you can think of this programs as hosting an environment (variables can be saved and refered to later, among other things). But shells can run any program, including other shells. So there is a notion of ‘inheritance’ from the environment - each subsequently called shell is contained in the other.

The main features of a shell

The basic features a shell program has are some basic programming constructs (loops, conditionals, etc.) and ways to link programs together (pipes and redirections). Before we examine those, let’s get a little experience working in the command line. We’ll learn how to:

- print the current directory
- list the contents
- make a new directory
- copy and rename files
- examine the content of (text) files
- Use tab completion
- Use meta characters

Exercises - basic commands

From the terminal screen

  1. Print the current directory
  2. Make a directory called ‘mydir’
  3. Change directory to ‘mydir’
  4. Copy this notebook to mydir.
  5. Make another copy of this notebook named ‘copy’
  6. List the contents of the directory
  7. List the contents of the directory in long form with permissions and modification times
  8. Sort the list by time.
  9. Sort the list in reverse order.
  10. List all files in the parent directory.
  11. List all files in the parent directory with extension .ipynb
  12. List all files in the parent directory that have an ‘_’ as the second character in the name.

Command glossary

  • ls (list)
  • mkdir (make directory)
  • cp (copy file)
  • mv (rename file)
  • pwd (print working directory)

meta characters

  • * matches all
  • ? matches a single character

Programming constructs in bash

loops

  • for loops

    Use for iterating over a fixed number of items (may be unknown at time of coding)

    for i in $( ls ); do echo item: $i done

  • while loops

    Use for iterating until a certain condition is met

    while true; do echo ‘hello’ done

  • until (really doesn’t do any more than for or while

#### Conditionals (sometimes called ‘flow control’)

  • Simple if-then

if [ "foo" = "foo" ]; then echo expression evaluated as true fi

  • if-then-else

if [ "foo" = "foo" ]; then echo expression evaluated as true else echo expression evaluated as false fi

Exercises for loops and conditionals

  1. Write a for loop to repeat ‘hello world’ 10 times. Print the number of iteration so the output looks like:
    1. hello world
    2. hello world
    3. hello world etc.
  2. Modify the above to add the word ‘again’ as many times as the number of current iterations:
  3. hello world
  4. hello world again
  5. hello world again again etc.
  6. Modify 2 to stop after 3 iterations, after printing ‘enough already’.
  7. Modify 3 to use a while loop.

Linking programs together - pipes and I/O redirection

I/O streams

There are three standard I/O streams in Unix:

  • stdin or standard input. The default is the parent process - usually the keyboard
  • stdout or standard output. The default is the parent process - usually the terminal screen
  • stderr or standard error. Also defaults to the parent, and usually the terminal screen. The important point is that errors and output may be treated separately.

I/O redirections using |, >, <

A program’s input and output can be redirected from the standard streams in many powerful ways:

  • We can store a program’s output in a file to be saved. We can separately save errors (>).
  • We can direct a program to take input from a file rather than the keyboard (<).
  • We can direct a program’s output to be another program’s input (|). This allows linking of several programs to form a workflow.

Exercises

  1. The ps -aef command shows all processes currently running. Use | and the program grep to find the processes you are running (your user name is ‘joyvan’)
  2. Create a list with the filenames in the current directly using ls and >. Use cat to see the contents.
  3. What happens if you repeat the above using >>?

Putting it all together - scripting

The next lesson will teach you to write what are called ‘scripts’. These are lists of commands saved in a file that can be executed. This makes a workflow reproducible. All of the steps are saved and can be run again in exactly the same manner.