The Unix Shell: Writing Shell Scripts

The shell commands constitute a programming language, and command line programs known as shell scripts can be written to perform complex tasks.

This will only provide a brief overview - shell scripts have many traps and pitfalls for the unwary, and we generally prefer to use languages such as Python or R with more consistent syntax for complex tasks. However, shell scripts are extensively used in domains such as the preprocessing of genomics data, and it is a useful tool to know about.

Pipe error hack

In [ ]:
cleanup () {
    :
}

trap "cleanup" SIGPIPE

Warn on uninitialized variable

You will typically want to set this. I will leave it unset for this session to show what can happen if you don’t.

set -u

Assigning variables

We assign variables using = and recall them by using $. It is customary to spell shell variable names in ALL_CAPS.

In [ ]:
NAME='Joe'
echo "Hello $NAME"
echo "Hello ${NAME}"

Single and double parentheses

The main difference between the use of ‘’ and “” is that variable expansion only occurs with double parentheses. For plain text, they are equivalent.

In [ ]:
echo '${NAME}'
In [ ]:
echo "${NAME}"

Use of curly braces

Use of curly braces unambiguously specifies the variable of interest. I suggest you always use them as a defensive programming technique.

In [ ]:
echo "Hello ${NAME}l"

$Namel is not defined, and so returns an empty string!

In [ ]:
echo "Hello $NAMEl"

One of the quirks of shell scripts is already present - there cannot be spaces before or after the = in an assignment.

In [ ]:
NAME2= 'Joe'
echo "Hello ${NAME2}"

The previous instruction assigns the empty space to NAME2, then tries to execute ‘Joe’ as a command.

In [ ]:
NAME3 ='Joe'
echo "Hello ${NAME3}"

The previous instruction runs the command NAME3 with =’Joe’ as its argument.

Assigning commands to variables

In [ ]:
pwd
In [ ]:
CUR_DIR=$(pwd)
dirname ${CUR_DIR}
basename ${CUR_DIR}

Working with numbers

Careful: Note the use of double parentheses to trigger evaluation of a mathematical expression.

In [5]:
NUM=0
((NUM++))
echo $NUM
1
In [ ]:
NUM=$((1+2+3+4))
echo ${NUM}

seq generates a range of numbers

In [ ]:
seq 3
In [ ]:
seq 2 5
In [ ]:
seq 5 2 9

Branching

Using if to check for file existence

Note the test condition must use square brackets.

In [ ]:
if [ -f hello.txt ]; then
    cat hello.txt
else
    echo "No such file"
fi

Downloading remote files

In [31]:
man wget | head -n 20
WGET(1)                            GNU Wget                            WGET(1)

NAME
       Wget - The non-interactive network downloader.

SYNOPSIS
       wget [option]... [URL]...

DESCRIPTION
       GNU Wget is a free utility for non-interactive download of files from
       the Web.  It supports HTTP, HTTPS, and FTP protocols, as well as
       retrieval through HTTP proxies.

       Wget is non-interactive, meaning that it can work in the background,
       while the user is not logged on.  This allows you to start a retrieval
       and disconnect from the system, letting Wget finish the work.  By
       contrast, most of the Web browsers require constant user's presence,
       which can be a great hindrance when transferring a lot of data.

       Wget can follow links in HTML, XHTML, and CSS pages, to create local
col: write error
grotty:<standard input> (<standard input>):37016:fatal error: output error
man: command exited with status 1: (cd /usr/share/man && /usr/lib/man-db/zsoelim) | (cd /usr/share/man && /usr/lib/man-db/manconv -f UTF-8:ISO-8859-1 -t UTF-8//IGNORE) | (cd /usr/share/man && preconv -e UTF-8) | (cd /usr/share/man && tbl) | (cd /usr/share/man && nroff -mandoc -Tutf8)

Some useful wget flags

  • -O redirects to named path
  • -O- redirects to standard output (so it can be piped)
  • -q suppresses messages
In [ ]:
if [ ! -f "data/forbes.csv" ]; then
    wget https://vincentarelbundock.github.io/Rdatasets/csv/HSAUR/Forbes2000.csv \
    -O data/forbes.csv
fi
In [ ]:
wget -qO- https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/Forbes2000.html \
    | html2text | head -n 27  | tail -n 17

Conditional evaluation with test

The [ -f hello.txt ] syntax is equivalent to test -f hello.txt, where test is a shell command with a large range of operators and flags that you can view in the man page.

TEST(1)                   BSD General Commands Manual                  TEST(1)

NAME
     test, [ -- condition evaluation utility

SYNOPSIS
     test expression
     [ expression ]

DESCRIPTION
     The test utility evaluates the expression and, if it evaluates to true,
     returns a zero (true) exit status; otherwise it returns 1 (false).  If
     there is no expression, test also returns 1 (false).

     All operators and flags are separate arguments to the test utility.

     The following primaries are used to construct expression:

     -b file       True if file exists and is a block special file.

     -c file       True if file exists and is a character special file.

     -d file       True if file exists and is a directory.

     -e file       True if file exists (regardless of type).

     -f file       True if file exists and is a regular file.

     -g file       True if file exists and its set group ID flag is set.

Looping

For loop

In [ ]:
for FILE in $(ls *ipynb); do
    echo $FILE
done

Traditional C-style for loop

We make use of the double parentheses to evaluate the counter arithmatic.

In [2]:
for ((i=0; i<5; i++)); do
    echo $i
done
0
1
2
3
4

Double square brackets provide enhanced test functionality

  • You can use && instead of -a
  • You can use || instead of -o
  • you can use =~ to match regular expression patterns

Example of regular expression matching in test condition

NOte that STRING should be quoted and REGULAR EXPRESSION should not be quoted.

In [30]:
for FILE in $(ls); do
    if [[ "${FILE}" =~ ^Bash.*sh ]]; then
        echo $FILE
    fi
done
Bash_Exercise_1_Solutions.sh
Bash_Exercise_2_Solutoins.sh

While loop

In [ ]:
COUNTER=10
while [ $COUNTER -gt 0 ]; do
    echo $COUNTER
    COUNTER=$(($COUNTER - 1))
done

Careful: Note that < is the redirection operator, and hence will lead to an infinite loop. Use -lt for less than and -gt for greater than, == for equality and != for inequality.

In [ ]:
COUNTER=10
while [ $COUNTER != 0 ]; do
    echo $COUNTER
    COUNTER=$(($COUNTER - 1))
done

Brace expansion

Brace expansions create lists of strings, typically used in loops.

In [1]:
for NUM in {000..005}; do
    echo mkdir EXPT-${NUM}
done
mkdir EXPT-000
mkdir EXPT-001
mkdir EXPT-002
mkdir EXPT-003
mkdir EXPT-004
mkdir EXPT-005

But brace expansions cna be used outside loops as well.

In [38]:
echo foo.{c,cpp,h,hp}
foo.c foo.cpp foo.h foo.hp

Shell script

From now on, we will write the shell script using an editor for convenience.

A shell script is traditionally given the extension .sh. There are a few things to note:

  1. To make the script standalone, you need to add #!/path/to/shell in the first line. Otherwise you need to call the script with bash /path/to/script instead of just /path/to/script.
  2. To make the script executable, change the file permissions to executable with chmod +x /path/to/script
  3. Shell arguments are similar to function arguments - i.e. $1, $2, $@ etc. Another useful variable is $# which gives the number of command line arguments.
In [ ]:
which bash
In [40]:
cat -g scripts/my_first_script.sh
Error: cannot read infile: [Errno 2] No such file or directory: 'scripts/cat_if_exists.sh'

In [ ]:
chmod +x scripts/my_first_script.sh
In [ ]:
scripts/my_first_script.sh

Passing arguments to shell scripts

In [52]:
cat <<EOF > my_second_script.sh
#!/bin/bash

echo $#
for ARG in {0..$#}; do
    echo $ARG
done
EOF
In [53]:
chmod +x my_second_script.sh
In [54]:
./my_second_script.sh 1 2 3
0

In [51]:
which bash
/bin/bash