The Unix Shell: Finding Stuff

Flexible ways to find files of interest.

Using locate

Many *nix systems maintain a database of path names that can be searched with locate. This is not available on the Docker container you are using.

LOCATE(1)                 BSD General Commands Manual                LOCATE(1)

NAME
     locate -- find filenames quickly

SYNOPSIS
     locate [-0Scims] [-l limit] [-d database] pattern ...

DESCRIPTION
     The locate program searches a database for all pathnames which match the
     specified pattern.  The database is recomputed periodically (usually
     weekly or daily), and contains the pathnames of all files which are pub-
     licly accessible.

     Shell globbing and quoting characters (``*'', ``?'', ``\'', ``['' and
     ``]'') may be used in pattern, although they will have to be escaped from
     the shell.  Preceding any character with a backslash (``\'') eliminates
     any special meaning which it may have.  The matching differs in that no
     characters must be matched explicitly, including slashes (``/'').

Using grep

grep is used to find regular expression patterns within files. We have covered regular expressions in a previous lecture, but here are the basics as a reminder.

. represents one of any character
+ represents one or more of the preceding pattern
* represents zero or more of the preceding pattern
^ matches at start of line
$ matches at end of line
[a|b|c] matches a or b or c
(cat|dog) matches cat or dog
[A-Z] matches all upper case characters
[0-9] matches all digits

The -E flag to grep removes the need to escape special characters.

In [1]:
cat hello.txt
1 Hello, bash
2 Hello, again
3 Hello
4 again

Searching a file

In [2]:
grep "Hello" hello.txt
1 Hello, bash
2 Hello, again
3 Hello

Recursive searching

In [3]:
grep -r "Hello" ./*txt
./hello.txt:1 Hello, bash
./hello.txt:2 Hello, again
./hello.txt:3 Hello

Searching for words

In [4]:
grep "ash" *.txt
1 Hello, bash
In [5]:
grep -w "ash" *.txt

Counting words

In [6]:
grep -c "Hello" *.txt
3

And with color!

In [7]:
grep --color "Hello" *.txt
1 Hello, bash
2 Hello, again
3 Hello

Get filenames only

We can use grep to find files matching some regular expression.

In [8]:
grep -l "Hello" *.txt
hello.txt

Find only directories

In [4]:
ls -d */
data/           scripts/

Using grep

In [5]:
ls -l
total 320
-rw-r--r--   1 cliburn  staff    120 Jul 26 10:02 MD5_CHECKSUM
-rw-r--r--   1 cliburn  staff  46843 Jul 26 10:10 The_Unix_Shell_01___File_and_Directory_Management.ipynb
-rw-r--r--   1 cliburn  staff   6930 Jul 26 09:51 The_Unix_Shell_02___Working_with_Text.ipynb
-rw-r--r--   1 cliburn  staff  15644 Jul 26 09:22 The_Unix_Shell_03___Regular_Expresssions.ipynb
-rw-r--r--   1 cliburn  staff  13409 Jul 26 09:20 The_Unix_Shell_04___Finding_Stuff.ipynb
-rw-r--r--   1 cliburn  staff  22120 Jul 26 09:25 The_Unix_Shell_05___Shell_Scripts.ipynb
-rw-r--r--   1 cliburn  staff   1106 Jul 26 10:06 The_Unix_Shell___Exercises.ipynb
-rw-r--r--   1 cliburn  staff      6 Jul 26 10:01 a.txt
-rw-r--r--   1 cliburn  staff      6 Jul 26 10:02 b.txt
-rw-r--r--   1 cliburn  staff      6 Jul 26 10:01 c.txt
drwxr-xr-x  12 cliburn  staff    408 Jul 26 09:11 data
-rw-r--r--   1 cliburn  staff     46 Jul 26 09:55 goodbye.md5
-rw-r--r--   1 cliburn  staff     45 Jul 26 09:55 goodbye.txt
-rw-r--r--   1 cliburn  staff     45 Jul 26 09:52 hell.txt
-rw-r--r--   1 cliburn  staff     44 Jul 26 09:49 hello.md5
-rw-r--r--   1 cliburn  staff     45 Jul 26 10:00 hello.txt
drwxr-xr-x   5 cliburn  staff    170 Jul 26 09:25 scripts
-rw-r--r--   1 cliburn  staff     42 Jul 26 09:31 stderr.txt
-rw-r--r--   1 cliburn  staff      0 Jul 26 09:31 stdout.txt
-rw-r--r--   1 cliburn  staff     44 Jul 26 10:00 test.md5
In [6]:
ls -l | grep -E '^d'
drwxr-xr-x  12 cliburn  staff    408 Jul 26 09:11 data
drwxr-xr-x   5 cliburn  staff    170 Jul 26 09:25 scripts
In [25]:
ls -l | grep -E '^d'
drwxr-xr-x  12 cliburn  staff    408 Jul 26 09:11 data
drwxr-xr-x   5 cliburn  staff    170 Jul 26 09:25 scripts

Using the invert -v option to find only files

In [26]:
ls -l | grep -Ev '^d'
total 328
-rw-r--r--   1 cliburn  staff    120 Jul 26 10:02 MD5_CHECKSUM
-rw-r--r--   1 cliburn  staff  46843 Jul 26 10:10 The_Unix_Shell_01___File_and_Directory_Management.ipynb
-rw-r--r--   1 cliburn  staff   6930 Jul 26 09:51 The_Unix_Shell_02___Working_with_Text.ipynb
-rw-r--r--   1 cliburn  staff  15644 Jul 26 09:22 The_Unix_Shell_03___Regular_Expresssions.ipynb
-rw-r--r--   1 cliburn  staff  16745 Jul 26 10:16 The_Unix_Shell_04___Finding_Stuff.ipynb
-rw-r--r--   1 cliburn  staff  22120 Jul 26 09:25 The_Unix_Shell_05___Shell_Scripts.ipynb
-rw-r--r--   1 cliburn  staff   1106 Jul 26 10:06 The_Unix_Shell___Exercises.ipynb
-rw-r--r--   1 cliburn  staff      6 Jul 26 10:01 a.txt
-rw-r--r--   1 cliburn  staff      6 Jul 26 10:02 b.txt
-rw-r--r--   1 cliburn  staff      6 Jul 26 10:01 c.txt
-rw-r--r--   1 cliburn  staff     46 Jul 26 09:55 goodbye.md5
-rw-r--r--   1 cliburn  staff     45 Jul 26 09:55 goodbye.txt
-rw-r--r--   1 cliburn  staff     45 Jul 26 09:52 hell.txt
-rw-r--r--   1 cliburn  staff     44 Jul 26 09:49 hello.md5
-rw-r--r--   1 cliburn  staff     45 Jul 26 10:00 hello.txt
-rw-r--r--   1 cliburn  staff     42 Jul 26 09:31 stderr.txt
-rw-r--r--   1 cliburn  staff      0 Jul 26 09:31 stdout.txt
-rw-r--r--   1 cliburn  staff     44 Jul 26 10:00 test.md5

Using find

While grep can find files matching some regular expression, the find command is used to locate files of interest based on various file properties. We will show a few examples.

FIND(1)                   BSD General Commands Manual                  FIND(1)

NAME
     find -- walk a file hierarchy

SYNOPSIS
     find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
     find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]

DESCRIPTION
     The find utility recursively descends the directory tree for each path
     listed, evaluating an expression (composed of the ``primaries'' and
     ``operands'' listed below) in terms of each file in the tree.

     The options are as follows:

     -E      Interpret regular expressions followed by -regex and -iregex pri-
             maries as extended (modern) regular expressions rather than basic
             regular expressions (BRE's).  The re_format(7) manual page fully
In [9]:
ls -R
The_Unix_Shell_01___File_and_Directory_Management.ipynb
The_Unix_Shell_03___Working_with_Text.ipynb
The_Unix_Shell_04___Regular_Expresssions.ipynb
The_Unix_Shell_05___Finding_Stuff.ipynb
The_Unix_Shell_06___Shell_Scripts.ipynb
data
hello.txt
scripts

./data:
X.txt                   example.fna             iris24.csv
Y.txt                   food_and_groups.csv     titanic.csv
Y1.txt                  forbes.csv
Y2.txt                  iris.csv

./scripts:
avg.sh                  extract_headers.sh
cat_if_exists.sh        rename.py

Find by filename

In [10]:
find . -name iris*
./data/iris.csv
./data/iris24.csv

Find is case sensitive by default

In [11]:
find . -name "*unix*ipynb"

Use `-iname for case-insensitive search

In [12]:
find . -iname "*unix*ipynb"
./.ipynb_checkpoints/The_Unix_Shell_01___File_and_Directory_Management-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_03___Working_with_Text-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_04___Regular_Expresssions-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_05___Finding_Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_06___Shell_Scripts-checkpoint.ipynb
./The_Unix_Shell_01___File_and_Directory_Management.ipynb
./The_Unix_Shell_03___Working_with_Text.ipynb
./The_Unix_Shell_04___Regular_Expresssions.ipynb
./The_Unix_Shell_05___Finding_Stuff.ipynb
./The_Unix_Shell_06___Shell_Scripts.ipynb

Limiting recursion depth

In [14]:
find . -name "*[csv|txt]"
./.ipynb_checkpoints
./data/food_and_groups.csv
./data/forbes.csv
./data/iris.csv
./data/iris24.csv
./data/titanic.csv
./data/X.txt
./data/Y.txt
./data/Y1.txt
./data/Y2.txt
./hello.txt
./scripts
./scripts/.ipynb_checkpoints
In [15]:
find . -name "*[csv|txt]" -maxdepth 1
./.ipynb_checkpoints
./hello.txt
./scripts

Find by time

Files notebooks created more than 1 day ago

In [16]:
find . -name "*ipynb" -ctime +1

Files notebooks modified within the last day

In [17]:
find . -name "*ipynb" -mtime -1
./.ipynb_checkpoints/The_Unix_Shell_01___File_and_Directory_Management-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_03___Working_with_Text-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_04___Regular_Expresssions-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_05___Finding_Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The_Unix_Shell_06___Shell_Scripts-checkpoint.ipynb
./The_Unix_Shell_01___File_and_Directory_Management.ipynb
./The_Unix_Shell_03___Working_with_Text.ipynb
./The_Unix_Shell_04___Regular_Expresssions.ipynb
./The_Unix_Shell_05___Finding_Stuff.ipynb
./The_Unix_Shell_06___Shell_Scripts.ipynb

Files modified in the past 15 minutes

In [18]:
find . -name "*ipynb" -mmin -15
./The_Unix_Shell_06___Shell_Scripts.ipynb