The Unix Shell: Finding Stuff

Flexible ways to find files of interest.

Using locate

Many *nix systems maintain a database that can be searched with locate.

In [1]:
man locate | head -n 20

LOCATE(1)                 BSD General Commands Manual                LOCATE(1)

NAME
     locate -- find filenames quickly

SYNOPSIS
     locate [-0Scims] [-l limit] [-d database] pattern ...

DESCRIPTION
     The locate program searches a database for all pathnames which match the
     specified pattern.  The database is recomputed periodically (usually
     weekly or daily), and contains the pathnames of all files which are pub-
     licly accessible.

     Shell globbing and quoting characters (``*'', ``?'', ``\'', ``['' and
     ``]'') may be used in pattern, although they will have to be escaped from
     the shell.  Preceding any character with a backslash (``\'') eliminates
     any special meaning which it may have.  The matching differs in that no
     characters must be matched explicitly, including slashes (``/'').
In [7]:
locate -i "unix shell"
/Users/cliburn/_teach/HTS_SummerCourse_2017/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell-Solutions.ipynb
/Users/cliburn/_teach/HTS_SummerCourse_2017/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell.ipynb
/Users/cliburn/_teach/data-science-foundations-2017/lessons/lesson01/.ipynb_checkpoints/01 The Unix Shell-Solutions-checkpoint.ipynb
/Users/cliburn/_teach/data-science-foundations-2017/lessons/lesson01/.ipynb_checkpoints/01 The Unix Shell-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-Copy1-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-Solutions-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell-Solutions.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell.ipynb

Using grep

grep is used to find regular expression patterns within files. We will cover regular expressions in a subsequent lecture, but here are the basics.

. represents one of any character
+ represents one or more of the preceding pattern
* represents zero or more of the preceding pattern
^ matches at start of line
$ matches at end of line
[a|b|c] matches a or b or c
[cat|dog] matches cat or dog
[A-Z] matches all upper case characters
[0-9] matches all digits
In [106]:
cat hello.txt
1 Hello, bash
2 Hello, again
3 Hello
4 again

Searching a file

In [108]:
grep "Hello" hello.txt
1 Hello, bash
2 Hello, again
3 Hello

Recursive searching

In [111]:
grep -r "Hello" ./*txt
./hello.txt:1 Hello, bash
./hello.txt:2 Hello, again
./hello.txt:3 Hello

Searching for words

In [112]:
grep "ash" *.txt
hello.txt:1 Hello, bash
In [113]:
grep -w "ash" *.txt

Counting words

In [114]:
grep -c "Hello" *.txt
hello.txt:3
stderr.txt:0
stdout.txt:0
test1.txt:0

And with color!

In [115]:
grep --color "Hello" *.txt
hello.txt:1 Hello, bash
hello.txt:2 Hello, again
hello.txt:3 Hello

Get filenames only

We can use grep to find files matching some regular expression.

In [116]:
grep -l "Hello" *.txt
hello.txt

Using find

While grep can find files matching some regular expression, the find command is used to locate files of interest based on various file properties. We will show a few examples.

In [12]:
man find | head -n 20

FIND(1)                   BSD General Commands Manual                  FIND(1)

NAME
     find -- walk a file hierarchy

SYNOPSIS
     find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
     find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]

DESCRIPTION
     The find utility recursively descends the directory tree for each path
     listed, evaluating an expression (composed of the ``primaries'' and
     ``operands'' listed below) in terms of each file in the tree.

     The options are as follows:

     -E      Interpret regular expressions followed by -regex and -iregex pri-
             maries as extended (modern) regular expressions rather than basic
             regular expressions (BRE's).  The re_format(7) manual page fully
In [9]:
ls -R
01_The_Unix_Shell.ipynb
01_The_Unix_Shell_Solutions.ipynb
The Unix Shell - File and Directory Management.ipynb
The Unix Shell - Finding Stuff.ipynb
The Unix Shell - Getting Help.ipynb
The Unix Shell - Shell Scripts.ipynb
The Unix Shell - Working with Text.ipynb
data
hello.txt
one two three
scripts
stderr.txt
stdout.txt
test1.txt

./data:
iris.csv        iris24.csv

./scripts:
avg.sh

Find by filename

In [37]:
find . -name iris*
./data/iris.csv
./data/iris24.csv

Find is case sensitive by default

In [27]:
find . -name "*unix*ipynb"

Use `-iname for case-insensitive search

In [28]:
find . -iname "*unix*ipynb"
./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb
./01_The_Unix_Shell.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb

Limiting recursion depth

In [103]:
find . -name "*[csv|txt]"
./.ipynb_checkpoints
./data/iris.csv
./data/iris24.csv
./hello.txt
./scripts
./stderr.txt
./stdout.txt
./test1.txt
In [104]:
find . -name "*[csv|txt]" -maxdepth 1
./.ipynb_checkpoints
./hello.txt
./scripts
./stderr.txt
./stdout.txt
./test1.txt

Find by time

Files notebooks created more than 1 day ago

In [19]:
find . -name "*ipynb" -ctime +1
./01_The_Unix_Shell.ipynb

Files notebooks modified within the last day

In [20]:
find . -name "*ipynb" -mtime -1
./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb

Files modified in the past 15 minutes

In [24]:
find . -name "*ipynb" -mmin -15
./The Unix Shell - Finding Stuff.ipynb

Delete text file modified in the last minute or less

In [57]:
touch Delete_Me.txt
In [58]:
ls *txt
Delete_Me.txt   hello.txt       stdout.txt
Delete_Me2.txt  stderr.txt      test1.txt
In [59]:
find . -name "*txt" -mmin -1
./Delete_Me.txt
./Delete_Me2.txt
In [60]:
find . -name "*txt" -mmin -1 -delete
In [61]:
ls *txt
hello.txt       stderr.txt      stdout.txt      test1.txt

Alternative more common way is to pipe to rm

We use xargs to split the output into sublists that can be processed one at a time by rm

In [73]:
touch Delete_Me2.txt
In [74]:
ls *txt
Delete_Me2.txt  hello.txt       stderr.txt      stdout.txt      test1.txt
In [76]:
find . -name "*txt" -mmin -1 | xargs rm
In [77]:
ls *txt
hello.txt       stderr.txt      stdout.txt      test1.txt

Find files containing a phrase

find can be combined with grep to first find a subset of files of interest, and then looking for lines containing a regular expression within those files.

In [67]:
find . -name "*ipynb"
./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb
./01_The_Unix_Shell.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb
In [89]:
find . -name "*txt" | xargs grep "again"
./hello.txt:2 Hello, again
./hello.txt:4 again

If filenames can contain spaces, we need to do more work

By default, args uses spaces as delimiters, and we will error messages since fragments of filenames will be passed to grep.

In [98]:
find . -name "*Unix*" -not -path "*.ipynb_checkpoints/*" | xargs grep -i "Cipher"
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: File: No such file or directory
grep: and: No such file or directory
grep: Directory: No such file or directory
grep: Management.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Finding: No such file or directory
grep: Stuff.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Getting: No such file or directory
grep: Help.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Shell: No such file or directory
grep: Scripts.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Working: No such file or directory
grep: with: No such file or directory
grep: Text.ipynb: No such file or directory

To get around, use the -print0 argumnent to find paired with the -0 argument to xargs to change the delimiter to the NUL character.

In [97]:
find . -name "*Unix*" -not -path "*.ipynb_checkpoints/*" -print0 | xargs -0 grep -i "Cipher"
./The Unix Shell - Getting Help.ipynb:      "blowfish(n)              - Implementation of the Blowfish block cipher\n"
./The Unix Shell - Getting Help.ipynb:      "blowfish(n)              - Implementation of the Blowfish block cipher\n"
./The Unix Shell - Getting Help.ipynb:      "blowfish(n)                  Blowfish Block Cipher                 blowfish(n)\n",
./The Unix Shell - Getting Help.ipynb:      "       blowfish - Implementation of the Blowfish block cipher\n",