Unix Text and Arithmetic Quiz¶
Text¶
1 Use a here document with cat
to create a a multi-line file.
In [1]:
cat > baa.txt <<'EOF'
baa, baa black sheep
have you any wool?
EOF
In [2]:
cat baa.txt
baa, baa black sheep
have you any wool?
2. What is the difference between a plain label
and a quoted
'label'
in a here document?
In [5]:
cat > plain.txt <<EOF
echo $SHELL
EOF
In [6]:
cat plain.txt
echo /bin/bash
In [7]:
cat > quoted.txt <<'EOF'
echo $SHELL
EOF
In [8]:
cat quoted.txt
echo $SHELL
3. How would you run a bash command that sends its output and errors to separate files?
In [19]:
rmdir foo > output.txt 2> error.txt
In [20]:
cat output.txt
In [21]:
cat error.txt
rmdir: foo: No such file or directory
4. How would you run a bash command that sends its output and errors to the same file?
In [22]:
rmdir foo &> combined.txt
In [23]:
cat combined.txt
rmdir: foo: No such file or directory
5. Sort the numbers 1 10 100 2 20 200
in decreasing order.
In [27]:
echo 1 10 100 2 20 200 | tr ' ' '\n' | sort -nr
200
100
20
10
2
1
6. Download the file iris.csv from the URL
https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv
.
Extract the column headers and sort them in reverse order.
In [29]:
wget -q https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv
In [32]:
head -1 iris.csv | tr ',' '\n' | sort -r
species
sepal_width
sepal_length
petal_width
petal_length
7. Find the unique species in the downloaded iris.csv
file.
In [38]:
tail +2 iris.csv | cut -f 5 -d',' | sort | uniq
setosa
versicolor
virginica
8. How would you list all files that end in .o
, .c
or
.h
?
In [42]:
touch foo.o foo.c foo.h foo.a foo.b
In [43]:
ls *.{o,c,h}
foo.c foo.h foo.o
9. Create a
, b
and c
as sub-directories of the directory
given by pwd
using brace expansion.
In [44]:
mkdir ${PWD}/{a,b,c}
In [48]:
ls -d */
a/ b/ c/
Regular expressions¶
10. Do the first 3 exercises of Regeex Golf
Left as an exercise.
11. Find a regular expression to match only valid outputs (as much
as possible) of date +%d-%b-%y
.
For example, the following should not match
05-Mud-17
5-Mar-17
05-Mar-2017
56-Mar-17
But we will allow mistakes
like
30-Feb-18
37-Mar-00
In [59]:
for s in "05-Mud-17" "5-Mar-17" "05-Mar-2017" "56-Mar-17" "30-Feb-18" "37-Mar-00"; do
echo $s | grep -E '^[0-3][0-9]-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{2}$'
done
30-Feb-18
37-Mar-00
Finding stuff¶
12. Find a regular expression to find the repeated character runs in
DNA of length at least 4. USe the -oE
arguments to grep
to only
show captured groups. For example, the string
AAATTTTAAAACAAAGCGCGCGCATGC
should find
TTTT
AAAA
GCGCGCGC
In [101]:
echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | grep -oE '(.)\1{3,}|(.{2,})\2{1,}'
TTTT
AAAA
GCGCGCGC
13. Find all files greater than 3 KB that were created more than 1
day ago in the current directory, excluding anything in
.ipynb_checkpoints
.
In [107]:
find . -ctime +1 -size +3k -not -path './.ipynb_checkpoints/*'
./01_UnixBasics.ipynb
./01_UnixBasics_Solutions.ipynb
./EntryQuiz.ipynb
./EntryQuiz_Solutions.ipynb
14. Find all lines that contain the words git
(case-insensitive)
in files in the current directory created more the 3 days ago ,
excluding anything in .ipynb_checkpoints
. You should not find lines
where git
is part of a word like GitLab
.
In [134]:
find . -ctime +3 \
-not -path './.ipynb_checkpoints' \
-not -path './.ipynb_checkpoints/*' \
| xargs grep -i '\<git\>'
./EntryQuiz.ipynb: "- `git`\n",
./EntryQuiz.ipynb: "- `git` => distributed version control system\n",
./EntryQuiz_Solutions.ipynb: "- `git`\n",
./EntryQuiz_Solutions.ipynb: "- `git` => distributed version control system\n",
Modifying text¶
15. Find the reverse complement of AAATTTTAAAACAAAGCGCGCGCATGC
.
In [122]:
echo 'AAATTTTAAAACAAAGCGCGCGCATGC' | tr 'ACTG' 'TGAC' | rev
GCATGCGCGCGCTTTGTTTTAAAATTT
16. Print the first 10 rows whose species names is versicolor
in
iris.csv
.
In [131]:
tail +2 iris.csv | sed -n '/versicolor/p' | head -10
7,3.2,4.7,1.4,versicolor
6.4,3.2,4.5,1.5,versicolor
6.9,3.1,4.9,1.5,versicolor
5.5,2.3,4,1.3,versicolor
6.5,2.8,4.6,1.5,versicolor
5.7,2.8,4.5,1.3,versicolor
6.3,3.3,4.7,1.6,versicolor
4.9,2.4,3.3,1,versicolor
6.6,2.9,4.6,1.3,versicolor
5.2,2.7,3.9,1.4,versicolor
17. Create a new file iris1.csv
without a header row, and where
each species name just uses the first 2 characters of the original.
In [135]:
tail +2 iris.csv | \
sed -e 's/versicolor/ve/' -e 's/setosa/se/' -e 's/virginica/vi/' > iris1.csv
In [136]:
head iris1.csv
5.1,3.5,1.4,0.2,se
4.9,3,1.4,0.2,se
4.7,3.2,1.3,0.2,se
4.6,3.1,1.5,0.2,se
5,3.6,1.4,0.2,se
5.4,3.9,1.7,0.4,se
4.6,3.4,1.4,0.3,se
5,3.4,1.5,0.2,se
4.4,2.9,1.4,0.2,se
4.9,3.1,1.5,0.1,se
Arithmetic¶
18. Evaluate 1 + 1.
In [138]:
echo $((1 + 1))
2
19 Fidn the sum of all petal_length
values in iris.csv
.
In [139]:
head -1 iris.csv
sepal_length,sepal_width,petal_length,petal_width,species
In [160]:
tail +2 iris.csv | cut -f3 -d',' | paste -s -d+ - | bc
563.8
Clean up¶
20. Delete any file or directory created in this session.
In [165]:
rmdir a b c
rm foo*
rm iris* baa.txt plain.txt quoted.txt
In [ ]: