Introduction to Unix and bash¶
What is an operating system?¶
Most of you probably know that Unix (Linux), like Windows and MacOS, is an operating system (OS). An operating system is essentially the interface between applications (such as Word or your browser) and the computer itself (devices, memory, etc.)
We usually interact with the OS using a Graphical User Interface (GUI).¶
Those are the ‘windows’ that made microsoft into the giant it is today. We open windows into the OS (formally known as DOS). We point and click to find files, open programs, etc. This has the advantage that it is natural for the user. A disadvantage is that pointing and clicking isn’t exactly ‘reproducible’. We don’t have a record of everything we have done - and that is something we really want when analyzing data.
Another way - the command line¶
Unix, Windows and MacOS all have command line interfaces. We can access them via ‘terminal’ windows. The Jupyter environment offers two different ways to use the command line (in our case, we will use the bash shell - more on that later). There is a terminal window under the tab on the left hand side, but there is also a bash kernel, so that you can write commands in a notebook code cell and have them interpreted as unix commands. You will learn to use both in this course.
The unix command line interpreter is an interactive program¶
The interpreter parses the strings you enter and calls the appropriate executable program. There are different versions of this program, of which ‘bash’ is one. ‘sh’ was the original version, written in the 1970s by Ken Thompson. It was updated over the years and finally completely re-written in 1979 by Stephen Bourne. In 1989, another shell was built on the Bourne ‘sh’ - and it was named ‘bash’ - for Bourne Again sh. There are several other flavors, but we will be using bash.
sh? Why the hush?¶
sh is short for ‘shell’. The reason for this terminology is because you can think of this programs as hosting an environment (variables can be saved and refered to later, among other things). But shells can run any program, including other shells. So there is a notion of ‘inheritance’ from the environment - each subsequently called shell is contained in the other.
The main features of a shell¶
The basic features a shell program has are some basic programming constructs (loops, conditionals, etc.) and ways to link programs together (pipes and redirections). Before we examine those, let’s get a little experience working in the command line. We’ll learn how to:
- print the current directory
- list the contents
- make a new directory
- copy and rename files
- examine the content of (text) files
- Use tab completion
- Use meta characters
Exercises - basic commands¶
From the terminal screen
- Print the current directory
- Make a directory called ‘mydir’
- Change directory to ‘mydir’
- Copy this notebook to mydir.
- Make another copy of this notebook named ‘copy’
- List the contents of the directory
- List the contents of the directory in long form with permissions and modification times
- Sort the list by time.
- Sort the list in reverse order.
- List all files in the parent directory.
- List all files in the parent directory with extension
.ipynb
- List all files in the parent directory that have an ‘_’ as the second character in the name.
Command glossary¶
- ls (list)
- mkdir (make directory)
- cp (copy file)
- mv (rename file)
- pwd (print working directory)
meta characters¶
*
matches all- ? matches a single character
Programming constructs in bash¶
loops¶
for loops
Use for iterating over a fixed number of items (may be unknown at time of coding)
for i in $( ls ); do
echo item: $idone
while loops
Use for iterating until a certain condition is met
while true; do
echo ‘hello’done
until (really doesn’t do any more than
for
orwhile
#### Conditionals (sometimes called ‘flow control’)
- Simple if-then
if [ "foo" = "foo" ]; then
echo expression evaluated as true fi
- if-then-else
if [ "foo" = "foo" ]; then
echo expression evaluated as true
else
echo expression evaluated as false fi
Exercises for loops and conditionals¶
- Write a for loop to repeat ‘hello world’ 10 times. Print the number
of iteration so the output looks like:
- hello world
- hello world
- hello world etc.
- Modify the above to add the word ‘again’ as many times as the number of current iterations:
- hello world
- hello world again
- hello world again again etc.
- Modify 2 to stop after 3 iterations, after printing ‘enough already’.
- Modify 3 to use a while loop.
Linking programs together - pipes and I/O redirection¶
I/O streams¶
There are three standard I/O streams in Unix:
stdin
or standard input. The default is the parent process - usually the keyboardstdout
or standard output. The default is the parent process - usually the terminal screenstderr
or standard error. Also defaults to the parent, and usually the terminal screen. The important point is that errors and output may be treated separately.
I/O redirections using |
, >
, <
¶
A program’s input and output can be redirected from the standard streams in many powerful ways:
- We can store a program’s output in a file to be saved. We can
separately save errors (
>
). - We can direct a program to take input from a file rather than the
keyboard (
<
). - We can direct a program’s output to be another program’s input
(
|
). This allows linking of several programs to form a workflow.
Exercises¶
- The
ps -aef
command shows all processes currently running. Use|
and the programgrep
to find the processes you are running (your user name is ‘joyvan’) - Create a list with the filenames in the current directly using
ls
and>
. Usecat
to see the contents. - What happens if you repeat the above using
>>
?
Putting it all together - scripting¶
The next lesson will teach you to write what are called ‘scripts’. These are lists of commands saved in a file that can be executed. This makes a workflow reproducible. All of the steps are saved and can be run again in exactly the same manner.