{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The Unix Shell: Finding Stuff\n", "\n", "Flexible ways to find files of interest." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `locate`\n", "\n", "Many \\*nix systems maintain a database that can be searched with locate." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "LOCATE(1) BSD General Commands Manual LOCATE(1)\n", "\n", "NAME\n", " locate -- find filenames quickly\n", "\n", "SYNOPSIS\n", " locate [-0Scims] [-l limit] [-d database] pattern ...\n", "\n", "DESCRIPTION\n", " The locate program searches a database for all pathnames which match the\n", " specified pattern. The database is recomputed periodically (usually\n", " weekly or daily), and contains the pathnames of all files which are pub-\n", " licly accessible.\n", "\n", " Shell globbing and quoting characters (``*'', ``?'', ``\\'', ``['' and\n", " ``]'') may be used in pattern, although they will have to be escaped from\n", " the shell. Preceding any character with a backslash (``\\'') eliminates\n", " any special meaning which it may have. The matching differs in that no\n", " characters must be matched explicitly, including slashes (``/'').\n" ] } ], "source": [ "man locate | head -n 20" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/cliburn/_teach/HTS_SummerCourse_2017/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell-Solutions.ipynb\n", "/Users/cliburn/_teach/HTS_SummerCourse_2017/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell.ipynb\n", "/Users/cliburn/_teach/data-science-foundations-2017/lessons/lesson01/.ipynb_checkpoints/01 The Unix Shell-Solutions-checkpoint.ipynb\n", "/Users/cliburn/_teach/data-science-foundations-2017/lessons/lesson01/.ipynb_checkpoints/01 The Unix Shell-checkpoint.ipynb\n", "/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-Copy1-checkpoint.ipynb\n", "/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-Solutions-checkpoint.ipynb\n", "/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/.ipynb_checkpoints/Unix Shell-checkpoint.ipynb\n", "/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell-Solutions.ipynb\n", "/Users/cliburn/tmp/HTS_Summer_Course_2016/Materials/ComputationBootCampNotebooks/Wk1_Day3_PM/Unix Shell.ipynb\n" ] } ], "source": [ "locate -i \"unix shell\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `grep`\n", "\n", "`grep` is used to find regular expression patterns within files. We will cover regular expressions in a subsequent lecture, but here are the basics.\n", "\n", "```\n", ". represents one of any character\n", "+ represents one or more of the preceding pattern\n", "* represents zero or more of the preceding pattern\n", "^ matches at start of line\n", "$ matches at end of line\n", "[a|b|c] matches a or b or c\n", "[cat|dog] matches cat or dog\n", "[A-Z] matches all upper case characters\n", "[0-9] matches all digits\n", "```" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 Hello, bash\n", "2 Hello, again\n", "3 Hello\n", "4 again\n" ] } ], "source": [ "cat hello.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Searching a file" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 Hello, bash\n", "2 Hello, again\n", "3 Hello\n" ] } ], "source": [ "grep \"Hello\" hello.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Recursive searching" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./hello.txt:1 Hello, bash\n", "./hello.txt:2 Hello, again\n", "./hello.txt:3 Hello\n" ] } ], "source": [ "grep -r \"Hello\" ./*txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Searching for words" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt:1 Hello, bash\n" ] } ], "source": [ "grep \"ash\" *.txt" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "ename": "", "evalue": "1", "output_type": "error", "traceback": [] } ], "source": [ "grep -w \"ash\" *.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Counting words" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt:3\n", "stderr.txt:0\n", "stdout.txt:0\n", "test1.txt:0\n" ] } ], "source": [ "grep -c \"Hello\" *.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And with color!" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt:1 \u001b[01;31m\u001b[KHello\u001b[m\u001b[K, bash\n", "hello.txt:2 \u001b[01;31m\u001b[KHello\u001b[m\u001b[K, again\n", "hello.txt:3 \u001b[01;31m\u001b[KHello\u001b[m\u001b[K\n" ] } ], "source": [ "grep --color \"Hello\" *.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get filenames only\n", "\n", "We can use `grep` to find files matching some regular expression." ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt\n" ] } ], "source": [ "grep -l \"Hello\" *.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `find`\n", "\n", "While `grep` can find files matching some regular expression, the `find` command is used to locate files of interest based on various file properties. We will show a few examples." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "FIND(1) BSD General Commands Manual FIND(1)\n", "\n", "NAME\n", " find -- walk a file hierarchy\n", "\n", "SYNOPSIS\n", " find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]\n", " find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]\n", "\n", "DESCRIPTION\n", " The find utility recursively descends the directory tree for each path\n", " listed, evaluating an expression (composed of the ``primaries'' and\n", " ``operands'' listed below) in terms of each file in the tree.\n", "\n", " The options are as follows:\n", "\n", " -E Interpret regular expressions followed by -regex and -iregex pri-\n", " maries as extended (modern) regular expressions rather than basic\n", " regular expressions (BRE's). The re_format(7) manual page fully\n" ] } ], "source": [ "man find | head -n 20" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "01_The_Unix_Shell.ipynb\n", "01_The_Unix_Shell_Solutions.ipynb\n", "The Unix Shell - File and Directory Management.ipynb\n", "The Unix Shell - Finding Stuff.ipynb\n", "The Unix Shell - Getting Help.ipynb\n", "The Unix Shell - Shell Scripts.ipynb\n", "The Unix Shell - Working with Text.ipynb\n", "data\n", "hello.txt\n", "one two three\n", "scripts\n", "stderr.txt\n", "stdout.txt\n", "test1.txt\n", "\n", "./data:\n", "iris.csv\tiris24.csv\n", "\n", "./scripts:\n", "avg.sh\n" ] } ], "source": [ "ls -R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find by filename" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./data/iris.csv\n", "./data/iris24.csv\n" ] } ], "source": [ "find . -name iris*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find is case sensitive by default" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "find . -name \"*unix*ipynb\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use ``-iname` for case-insensitive search" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb\n", "./01_The_Unix_Shell.ipynb\n", "./01_The_Unix_Shell_Solutions.ipynb\n", "./The Unix Shell - File and Directory Management.ipynb\n", "./The Unix Shell - Finding Stuff.ipynb\n", "./The Unix Shell - Getting Help.ipynb\n", "./The Unix Shell - Shell Scripts.ipynb\n", "./The Unix Shell - Working with Text.ipynb\n" ] } ], "source": [ "find . -iname \"*unix*ipynb\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exclude unwanted directories from search" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./01_The_Unix_Shell.ipynb\n", "./01_The_Unix_Shell_Solutions.ipynb\n", "./The Unix Shell - File and Directory Management.ipynb\n", "./The Unix Shell - Finding Stuff.ipynb\n", "./The Unix Shell - Getting Help.ipynb\n", "./The Unix Shell - Shell Scripts.ipynb\n", "./The Unix Shell - Working with Text.ipynb\n" ] } ], "source": [ "find . -not -path \"*ipynb_checkpoints/*\" -iname \"*unix*ipynb\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Limiting recursion depth" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./.ipynb_checkpoints\n", "./data/iris.csv\n", "./data/iris24.csv\n", "./hello.txt\n", "./scripts\n", "./stderr.txt\n", "./stdout.txt\n", "./test1.txt\n" ] } ], "source": [ "find . -name \"*[csv|txt]\" " ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./.ipynb_checkpoints\n", "./hello.txt\n", "./scripts\n", "./stderr.txt\n", "./stdout.txt\n", "./test1.txt\n" ] } ], "source": [ "find . -name \"*[csv|txt]\" -maxdepth 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find by time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Files notebooks created more than 1 day ago" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./01_The_Unix_Shell.ipynb\n" ] } ], "source": [ "find . -name \"*ipynb\" -ctime +1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Files notebooks modified within the last day" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb\n", "./01_The_Unix_Shell_Solutions.ipynb\n", "./The Unix Shell - File and Directory Management.ipynb\n", "./The Unix Shell - Finding Stuff.ipynb\n", "./The Unix Shell - Getting Help.ipynb\n", "./The Unix Shell - Shell Scripts.ipynb\n", "./The Unix Shell - Working with Text.ipynb\n" ] } ], "source": [ "find . -name \"*ipynb\" -mtime -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Files modified in the past 15 minutes" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./The Unix Shell - Finding Stuff.ipynb\n" ] } ], "source": [ "find . -name \"*ipynb\" -mmin -15" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delete text file modified in the last minute or less" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": true }, "outputs": [], "source": [ "touch Delete_Me.txt" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Delete_Me.txt\thello.txt\tstdout.txt\n", "Delete_Me2.txt\tstderr.txt\ttest1.txt\n" ] } ], "source": [ "ls *txt" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./Delete_Me.txt\n", "./Delete_Me2.txt\n" ] } ], "source": [ "find . -name \"*txt\" -mmin -1 " ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": true }, "outputs": [], "source": [ "find . -name \"*txt\" -mmin -1 -delete" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt\tstderr.txt\tstdout.txt\ttest1.txt\n" ] } ], "source": [ "ls *txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Alternative more common way is to pipe to rm\n", "\n", "We use `xargs` to split the output into sublists that can be processed one at a time by `rm` " ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": true }, "outputs": [], "source": [ "touch Delete_Me2.txt" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Delete_Me2.txt\thello.txt\tstderr.txt\tstdout.txt\ttest1.txt\n" ] } ], "source": [ "ls *txt" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "find . -name \"*txt\" -mmin -1 | xargs rm" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello.txt\tstderr.txt\tstdout.txt\ttest1.txt\n" ] } ], "source": [ "ls *txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find files containing a phrase\n", "\n", "`find` can be combined with `grep` to first find a subset of files of interest, and then looking for lines containing a regular expression within those files." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./.ipynb_checkpoints/01_The_Unix_Shell_Solutions-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - File and Directory Management-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Finding Stuff-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Getting Help-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Shell Scripts-checkpoint.ipynb\n", "./.ipynb_checkpoints/The Unix Shell - Working with Text-checkpoint.ipynb\n", "./01_The_Unix_Shell.ipynb\n", "./01_The_Unix_Shell_Solutions.ipynb\n", "./The Unix Shell - File and Directory Management.ipynb\n", "./The Unix Shell - Finding Stuff.ipynb\n", "./The Unix Shell - Getting Help.ipynb\n", "./The Unix Shell - Shell Scripts.ipynb\n", "./The Unix Shell - Working with Text.ipynb\n" ] } ], "source": [ "find . -name \"*ipynb\" " ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./hello.txt:2 Hello, again\n", "./hello.txt:4 again\n" ] } ], "source": [ "find . -name \"*txt\" | xargs grep \"again\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### If filenames can contain spaces, we need to do more work\n", "\n", "By default, `args` uses spaces as delimiters, and we will error messages since fragments of filenames will be passed to `grep`." ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "grep: ./The: No such file or directory\n", "grep: Unix: No such file or directory\n", "grep: Shell: No such file or directory\n", "grep: File: No such file or directory\n", "grep: and: No such file or directory\n", "grep: Directory: No such file or directory\n", "grep: Management.ipynb: No such file or directory\n", "grep: ./The: No such file or directory\n", "grep: Unix: No such file or directory\n", "grep: Shell: No such file or directory\n", "grep: (standard input): Bad file descriptor\n", "grep: Finding: No such file or directory\n", "grep: Stuff.ipynb: No such file or directory\n", "grep: ./The: No such file or directory\n", "grep: Unix: No such file or directory\n", "grep: Shell: No such file or directory\n", "grep: (standard input): Bad file descriptor\n", "grep: Getting: No such file or directory\n", "grep: Help.ipynb: No such file or directory\n", "grep: ./The: No such file or directory\n", "grep: Unix: No such file or directory\n", "grep: Shell: No such file or directory\n", "grep: (standard input): Bad file descriptor\n", "grep: Shell: No such file or directory\n", "grep: Scripts.ipynb: No such file or directory\n", "grep: ./The: No such file or directory\n", "grep: Unix: No such file or directory\n", "grep: Shell: No such file or directory\n", "grep: (standard input): Bad file descriptor\n", "grep: Working: No such file or directory\n", "grep: with: No such file or directory\n", "grep: Text.ipynb: No such file or directory\n" ] }, { "ename": "", "evalue": "1", "output_type": "error", "traceback": [] } ], "source": [ "find . -name \"*Unix*\" -not -path \"*.ipynb_checkpoints/*\" | xargs grep -i \"Cipher\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get around, use the `-print0` argumnent to `find` paired with the `-0` argument to `xargs` to change the delimiter to the NUL character." ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./The Unix Shell - Getting Help.ipynb: \"blowfish(n) - Implementation of the Blowfish block cipher\\n\"\n", "./The Unix Shell - Getting Help.ipynb: \"blowfish(n) - Implementation of the Blowfish block cipher\\n\"\n", "./The Unix Shell - Getting Help.ipynb: \"blowfish(n) Blowfish Block Cipher blowfish(n)\\n\",\n", "./The Unix Shell - Getting Help.ipynb: \" blowfish - Implementation of the Blowfish block cipher\\n\",\n" ] } ], "source": [ "find . -name \"*Unix*\" -not -path \"*.ipynb_checkpoints/*\" -print0 | xargs -0 grep -i \"Cipher\"" ] } ], "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 2 }