Unix Text and Arithmetic Quiz

Text

1 Use a here document with cat to create a a multi-line file.

In [1]:
cat > baa.txt <<'EOF'
baa, baa black sheep
have you any wool?
EOF
In [2]:
cat baa.txt
baa, baa black sheep
have you any wool?

2. What is the difference between a plain label and a quoted 'label' in a here document?

In [5]:
cat > plain.txt <<EOF
echo $SHELL
EOF
In [6]:
cat plain.txt
echo /bin/bash
In [7]:
cat > quoted.txt <<'EOF'
echo $SHELL
EOF
In [8]:
cat quoted.txt
echo $SHELL

3. How would you run a bash command that sends its output and errors to separate files?

In [19]:
rmdir foo > output.txt 2> error.txt

In [20]:
cat output.txt
In [21]:
cat error.txt
rmdir: foo: No such file or directory

4. How would you run a bash command that sends its output and errors to the same file?

In [22]:
rmdir foo &> combined.txt

In [23]:
cat combined.txt
rmdir: foo: No such file or directory

5. Sort the numbers 1 10 100 2 20 200 in decreasing order.

In [27]:
echo 1 10 100 2 20 200 | tr ' ' '\n'  | sort -nr
200
100
20
10
2
1

6. Download the file iris.csv from the URL https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv. Extract the column headers and sort them in reverse order.

In [29]:
wget -q https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv
In [32]:
head -1 iris.csv | tr ',' '\n' | sort -r
species
sepal_width
sepal_length
petal_width
petal_length

7. Find the unique species in the downloaded iris.csv file.

In [38]:
tail +2 iris.csv  | cut -f 5 -d',' | sort | uniq
setosa
versicolor
virginica

8. How would you list all files that end in .o, .c or .h?

In [42]:
touch foo.o foo.c foo.h foo.a foo.b
In [43]:
ls *.{o,c,h}
foo.c   foo.h   foo.o

9. Create a, b and c as sub-directories of the directory given by pwd using brace expansion.

In [44]:
mkdir ${PWD}/{a,b,c}
In [48]:
ls -d */
a/      b/      c/

Regular expressions

10. Do the first 3 exercises of Regeex Golf

Left as an exercise.

11. Find a regular expression to match only valid outputs (as much as possible) of date +%d-%b-%y.

For example, the following should not match

05-Mud-17
5-Mar-17
05-Mar-2017
56-Mar-17

But we will allow mistakes like

30-Feb-18
37-Mar-00
In [59]:
for s in "05-Mud-17" "5-Mar-17" "05-Mar-2017" "56-Mar-17" "30-Feb-18" "37-Mar-00"; do
    echo $s | grep -E '^[0-3][0-9]-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{2}$'
done
30-Feb-18
37-Mar-00

Finding stuff

12. Find a regular expression to find the repeated character runs in DNA of length at least 4. USe the -oE arguments to grep to only show captured groups. For example, the string

AAATTTTAAAACAAAGCGCGCGCATGC

should find

TTTT
AAAA
GCGCGCGC
In [101]:
echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | grep -oE '(.)\1{3,}|(.{2,})\2{1,}'
TTTT
AAAA
GCGCGCGC

13. Find all files greater than 3 KB that were created more than 1 day ago in the current directory, excluding anything in .ipynb_checkpoints.

In [107]:
find . -ctime +1 -size +3k -not -path './.ipynb_checkpoints/*'
./01_UnixBasics.ipynb
./01_UnixBasics_Solutions.ipynb
./EntryQuiz.ipynb
./EntryQuiz_Solutions.ipynb

14. Find all lines that contain the words git (case-insensitive) in files in the current directory created more the 3 days ago , excluding anything in .ipynb_checkpoints. You should not find lines where git is part of a word like GitLab.

In [134]:
find . -ctime +3 \
-not -path './.ipynb_checkpoints' \
-not -path './.ipynb_checkpoints/*' \
| xargs  grep -i '\<git\>'
./EntryQuiz.ipynb:    "- `git`\n",
./EntryQuiz.ipynb:    "- `git` => distributed version control system\n",
./EntryQuiz_Solutions.ipynb:    "- `git`\n",
./EntryQuiz_Solutions.ipynb:    "- `git` => distributed version control system\n",

Modifying text

15. Find the reverse complement of AAATTTTAAAACAAAGCGCGCGCATGC.

In [122]:
echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | tr 'ACTG' 'TGAC' | rev
GCATGCGCGCGCTTTGTTTTAAAATTT

16. Print the first 10 rows whose species names is versicolor in iris.csv.

In [131]:
tail +2 iris.csv | sed -n '/versicolor/p' | head -10
7,3.2,4.7,1.4,versicolor
6.4,3.2,4.5,1.5,versicolor
6.9,3.1,4.9,1.5,versicolor
5.5,2.3,4,1.3,versicolor
6.5,2.8,4.6,1.5,versicolor
5.7,2.8,4.5,1.3,versicolor
6.3,3.3,4.7,1.6,versicolor
4.9,2.4,3.3,1,versicolor
6.6,2.9,4.6,1.3,versicolor
5.2,2.7,3.9,1.4,versicolor

17. Create a new file iris1.csv without a header row, and where each species name just uses the first 2 characters of the original.

In [135]:
tail +2 iris.csv | \
sed -e 's/versicolor/ve/' -e 's/setosa/se/' -e 's/virginica/vi/' > iris1.csv
In [136]:
head iris1.csv
5.1,3.5,1.4,0.2,se
4.9,3,1.4,0.2,se
4.7,3.2,1.3,0.2,se
4.6,3.1,1.5,0.2,se
5,3.6,1.4,0.2,se
5.4,3.9,1.7,0.4,se
4.6,3.4,1.4,0.3,se
5,3.4,1.5,0.2,se
4.4,2.9,1.4,0.2,se
4.9,3.1,1.5,0.1,se

Arithmetic

18. Evaluate 1 + 1.

In [138]:
echo $((1 + 1))
2

19 Fidn the sum of all petal_length values in iris.csv.

In [139]:
head -1 iris.csv
sepal_length,sepal_width,petal_length,petal_width,species
In [160]:
tail +2 iris.csv | cut -f3 -d','  | paste -s -d+ - | bc
563.8

Clean up

20. Delete any file or directory created in this session.

In [165]:
rmdir a b c
rm foo*
rm iris* baa.txt plain.txt quoted.txt
In [ ]: