The Unix Shell: Finding Stuff¶
Flexible ways to find files of interest.
Using locate¶
Many *nix systems maintain a database that can be searched with locate.
In [1]:
man locate | head -n 20
LOCATE(1) BSD General Commands Manual LOCATE(1)
NAME
locate -- find filenames quickly
SYNOPSIS
locate [-0Scims] [-l limit] [-d database] pattern ...
DESCRIPTION
The locate program searches a database for all pathnames which match the
specified pattern. The database is recomputed periodically (usually
weekly or daily), and contains the pathnames of all files which are pub-
licly accessible.
Shell globbing and quoting characters (``*'', ``?'', ``\'', ``['' and
``]'') may be used in pattern, although they will have to be escaped from
the shell. Preceding any character with a backslash (``\'') eliminates
any special meaning which it may have. The matching differs in that no
characters must be matched explicitly, including slashes (``/'').
In [7]:
locate -i "unix shell"
/Users/cliburn/_teach/HTS_SummerCourse_2017/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell-Solutions.ipynb
/Users/cliburn/_teach/HTS_SummerCourse_2017/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell.ipynb
/Users/cliburn/_teach/data-science-foundations-2017/lessons/lesson01/.ipynb_checkpoints/01 The Unix Shell-Solutions-checkpoint.ipynb
/Users/cliburn/_teach/data-science-foundations-2017/lessons/lesson01/.ipynb_checkpoints/01 The Unix Shell-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-Copy1-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-Solutions-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-checkpoint.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell-Solutions.ipynb
/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell.ipynb
Using grep¶
grep is used to find regular expression patterns within files. We
will cover regular expressions in a subsequent lecture, but here are the
basics.
. represents one of any character
+ represents one or more of the preceding pattern
* represents zero or more of the preceding pattern
^ matches at start of line
$ matches at end of line
[a|b|c] matches a or b or c
[cat|dog] matches cat or dog
[A-Z] matches all upper case characters
[0-9] matches all digits
In [106]:
cat hello.txt
1 Hello, bash
2 Hello, again
3 Hello
4 again
Recursive searching¶
In [111]:
grep -r "Hello" ./*txt
./hello.txt:1 Hello, bash
./hello.txt:2 Hello, again
./hello.txt:3 Hello
Searching for words¶
In [112]:
grep "ash" *.txt
hello.txt:1 Hello, bash
In [113]:
grep -w "ash" *.txt
And with color!¶
In [115]:
grep --color "Hello" *.txt
hello.txt:1 Hello, bash
hello.txt:2 Hello, again
hello.txt:3 Hello
Get filenames only¶
We can use grep to find files matching some regular expression.
In [116]:
grep -l "Hello" *.txt
hello.txt
Using find¶
While grep can find files matching some regular expression, the
find command is used to locate files of interest based on various
file properties. We will show a few examples.
In [12]:
man find | head -n 20
FIND(1) BSD General Commands Manual FIND(1)
NAME
find -- walk a file hierarchy
SYNOPSIS
find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]
DESCRIPTION
The find utility recursively descends the directory tree for each path
listed, evaluating an expression (composed of the ``primaries'' and
``operands'' listed below) in terms of each file in the tree.
The options are as follows:
-E Interpret regular expressions followed by -regex and -iregex pri-
maries as extended (modern) regular expressions rather than basic
regular expressions (BRE's). The re_format(7) manual page fully
In [9]:
ls -R
01_The_Unix_Shell.ipynb
01_The_Unix_Shell_Solutions.ipynb
The Unix Shell - File and Directory Management.ipynb
The Unix Shell - Finding Stuff.ipynb
The Unix Shell - Getting Help.ipynb
The Unix Shell - Shell Scripts.ipynb
The Unix Shell - Working with Text.ipynb
data
hello.txt
one two three
scripts
stderr.txt
stdout.txt
test1.txt
./data:
iris.csv iris24.csv
./scripts:
avg.sh
Find is case sensitive by default¶
In [27]:
find . -name "*unix*ipynb"
Use `-iname for case-insensitive search
In [28]:
find . -iname "*unix*ipynb"
./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb
./01_The_Unix_Shell.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb
Exclude unwanted directories from search¶
In [31]:
find . -not -path "*ipynb_checkpoints/*" -iname "*unix*ipynb"
./01_The_Unix_Shell.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb
Limiting recursion depth¶
In [103]:
find . -name "*[csv|txt]"
./.ipynb_checkpoints
./data/iris.csv
./data/iris24.csv
./hello.txt
./scripts
./stderr.txt
./stdout.txt
./test1.txt
In [104]:
find . -name "*[csv|txt]" -maxdepth 1
./.ipynb_checkpoints
./hello.txt
./scripts
./stderr.txt
./stdout.txt
./test1.txt
Find by time¶
Files notebooks created more than 1 day ago¶
In [19]:
find . -name "*ipynb" -ctime +1
./01_The_Unix_Shell.ipynb
Files notebooks modified within the last day¶
In [20]:
find . -name "*ipynb" -mtime -1
./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb
Files modified in the past 15 minutes¶
In [24]:
find . -name "*ipynb" -mmin -15
./The Unix Shell - Finding Stuff.ipynb
Delete text file modified in the last minute or less¶
In [57]:
touch Delete_Me.txt
In [58]:
ls *txt
Delete_Me.txt hello.txt stdout.txt
Delete_Me2.txt stderr.txt test1.txt
In [59]:
find . -name "*txt" -mmin -1
./Delete_Me.txt
./Delete_Me2.txt
In [60]:
find . -name "*txt" -mmin -1 -delete
In [61]:
ls *txt
hello.txt stderr.txt stdout.txt test1.txt
Alternative more common way is to pipe to rm¶
We use xargs to split the output into sublists that can be processed
one at a time by rm
In [73]:
touch Delete_Me2.txt
In [74]:
ls *txt
Delete_Me2.txt hello.txt stderr.txt stdout.txt test1.txt
In [76]:
find . -name "*txt" -mmin -1 | xargs rm
In [77]:
ls *txt
hello.txt stderr.txt stdout.txt test1.txt
Find files containing a phrase¶
find can be combined with grep to first find a subset of files
of interest, and then looking for lines containing a regular expression
within those files.
In [67]:
find . -name "*ipynb"
./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb
./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb
./01_The_Unix_Shell.ipynb
./01_The_Unix_Shell_Solutions.ipynb
./The Unix Shell - File and Directory Management.ipynb
./The Unix Shell - Finding Stuff.ipynb
./The Unix Shell - Getting Help.ipynb
./The Unix Shell - Shell Scripts.ipynb
./The Unix Shell - Working with Text.ipynb
In [89]:
find . -name "*txt" | xargs grep "again"
./hello.txt:2 Hello, again
./hello.txt:4 again
If filenames can contain spaces, we need to do more work¶
By default, args uses spaces as delimiters, and we will error
messages since fragments of filenames will be passed to grep.
In [98]:
find . -name "*Unix*" -not -path "*.ipynb_checkpoints/*" | xargs grep -i "Cipher"
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: File: No such file or directory
grep: and: No such file or directory
grep: Directory: No such file or directory
grep: Management.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Finding: No such file or directory
grep: Stuff.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Getting: No such file or directory
grep: Help.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Shell: No such file or directory
grep: Scripts.ipynb: No such file or directory
grep: ./The: No such file or directory
grep: Unix: No such file or directory
grep: Shell: No such file or directory
grep: (standard input): Bad file descriptor
grep: Working: No such file or directory
grep: with: No such file or directory
grep: Text.ipynb: No such file or directory
To get around, use the -print0 argumnent to find paired with the
-0 argument to xargs to change the delimiter to the NUL
character.
In [97]:
find . -name "*Unix*" -not -path "*.ipynb_checkpoints/*" -print0 | xargs -0 grep -i "Cipher"
./The Unix Shell - Getting Help.ipynb: "blowfish(n) - Implementation of the Blowfish block cipher\n"
./The Unix Shell - Getting Help.ipynb: "blowfish(n) - Implementation of the Blowfish block cipher\n"
./The Unix Shell - Getting Help.ipynb: "blowfish(n) Blowfish Block Cipher blowfish(n)\n",
./The Unix Shell - Getting Help.ipynb: " blowfish - Implementation of the Blowfish block cipher\n",