The Unix Shell: File and Directory Management¶
Listing files¶
In [93]:
ls
Bash_Exercise_1.ipynb
Bash_Exercise_1_Solutions.ipynb
Bash_Exercise_1_Solutions.sh
Bash_Exercise_2.ipynb
Bash_Exercise_2_Solutoins.ipynb
Bash_Exercise_2_Solutoins.sh
Bash_Exercise_Solutions.ipynb
Bash in Jupyter.ipynb
Bash_in_Jupyter.ipynb
Bash_tutorial-Copy1.ipynb
Bash_tutorial.ipynb
Bash_tutorial_prep.ipynb
data
data2
figs
hello.txt
nursery.txt
Process-RNA-seq-counts.ipynb
ref
R_Graphic_ ggplot2.ipynb
R Graphics Base.ipynb
R_Graphics_Base.ipynb
R_Graphics_Exercise.ipynb
R_Graphics_Exercise_Solutions.ipynb
R_Graphics_Exercise_Solutions.r
R Graphics ggplot2.ipynb
R Graphics Overview.ipynb
R_Graphics_Overview.ipynb
R_tidyverse_1.ipynb
R_tidyverse_2.ipynb
R_tidyverse_3.ipynb
R_tidyyverse_Exercise.ipynb
R_tidyyverse_Exercise_Solutions.ipynb
R_tidyyverse_Exercise_Solutions.r
stderr.txt
stdout.txt
The_Unix_Shell_01___File_and_Directory_Management.ipynb
The_Unix_Shell_02___Working_with_Text.ipynb
The_Unix_Shell_03___Finding_Stuff.ipynb
The_Unix_Shell_04___Regular_Expresssions.ipynb
The_Unix_Shell_05___Shell_Scripts.ipynb
The_Unix_Shell___Exercises.ipynb
The_Unix_Shell___Exercises_Solutions.ipynb
In [94]:
ls ref
cn.txt foo_Copy.ipynb
Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf header.txt
Exclude current and parent directory¶
In [96]:
ls -A
Bash_Exercise_1.ipynb
Bash_Exercise_1_Solutions.ipynb
Bash_Exercise_1_Solutions.sh
Bash_Exercise_2.ipynb
Bash_Exercise_2_Solutoins.ipynb
Bash_Exercise_2_Solutoins.sh
Bash_Exercise_Solutions.ipynb
Bash in Jupyter.ipynb
Bash_in_Jupyter.ipynb
Bash_tutorial-Copy1.ipynb
Bash_tutorial.ipynb
Bash_tutorial_prep.ipynb
data
data2
figs
hello.txt
.ipynb_checkpoints
nursery.txt
Process-RNA-seq-counts.ipynb
ref
R_Graphic_ ggplot2.ipynb
R Graphics Base.ipynb
R_Graphics_Base.ipynb
R_Graphics_Exercise.ipynb
R_Graphics_Exercise_Solutions.ipynb
R_Graphics_Exercise_Solutions.r
R Graphics ggplot2.ipynb
R Graphics Overview.ipynb
R_Graphics_Overview.ipynb
R_tidyverse_1.ipynb
R_tidyverse_2.ipynb
R_tidyverse_3.ipynb
R_tidyyverse_Exercise.ipynb
R_tidyyverse_Exercise_Solutions.ipynb
R_tidyyverse_Exercise_Solutions.r
stderr.txt
stdout.txt
The_Unix_Shell_01___File_and_Directory_Management.ipynb
The_Unix_Shell_02___Working_with_Text.ipynb
The_Unix_Shell_03___Finding_Stuff.ipynb
The_Unix_Shell_04___Regular_Expresssions.ipynb
The_Unix_Shell_05___Shell_Scripts.ipynb
The_Unix_Shell___Exercises.ipynb
The_Unix_Shell___Exercises_Solutions.ipynb
Show details¶
In [97]:
ls -l
total 4380
-rw-r--r-- 1 jovyan 1000 9683 Jul 9 12:39 Bash_Exercise_1.ipynb
-rw-r--r-- 1 jovyan 1000 15460 Jul 9 12:39 Bash_Exercise_1_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 1323 Jul 9 12:39 Bash_Exercise_1_Solutions.sh
-rw-r--r-- 1 jovyan 1000 6217 Jul 9 12:39 Bash_Exercise_2.ipynb
-rw-r--r-- 1 jovyan 1000 12757 Jul 9 12:39 Bash_Exercise_2_Solutoins.ipynb
-rw-r--r-- 1 jovyan 1000 1126 Jul 9 12:39 Bash_Exercise_2_Solutoins.sh
-rw-r--r-- 1 jovyan 1000 15448 Jul 1 21:42 Bash_Exercise_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 65260 Jun 27 21:28 Bash in Jupyter.ipynb
-rw-r--r-- 1 jovyan 1000 65260 Jul 9 12:39 Bash_in_Jupyter.ipynb
-rw-r--r-- 1 jovyan users 36489 Jul 9 14:29 Bash_tutorial-Copy1.ipynb
-rw-r--r-- 1 jovyan 1000 5967 Jul 9 12:39 Bash_tutorial.ipynb
-rw-r--r-- 1 jovyan 1000 29522 Jul 9 12:39 Bash_tutorial_prep.ipynb
drwxr-xr-x 2 jovyan users 4096 Jul 5 15:01 data
drwxr-xr-x 2 jovyan users 4096 Jul 9 14:59 data2
drwxr-xr-x 2 jovyan 1000 4096 Jul 5 15:01 figs
-rw-r--r-- 1 jovyan users 45 Jul 9 15:15 hello.txt
-rw-r--r-- 1 jovyan users 24 Jul 9 15:17 nursery.txt
-rw-r--r-- 1 jovyan 1000 12634 Jul 9 12:39 Process-RNA-seq-counts.ipynb
drwxr-xr-x 2 jovyan users 4096 Jul 9 15:00 ref
-rw-r--r-- 1 jovyan 1000 1398902 Jul 9 12:39 R_Graphic_ ggplot2.ipynb
-rw-r--r-- 1 jovyan 1000 155396 Jun 27 21:28 R Graphics Base.ipynb
-rw-r--r-- 1 jovyan 1000 155396 Jul 9 12:39 R_Graphics_Base.ipynb
-rw-r--r-- 1 jovyan 1000 10670 Jul 9 12:39 R_Graphics_Exercise.ipynb
-rw-r--r-- 1 jovyan 1000 195947 Jul 9 12:39 R_Graphics_Exercise_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 938 Jul 9 12:39 R_Graphics_Exercise_Solutions.r
-rw-r--r-- 1 jovyan 1000 1398902 Jun 27 21:28 R Graphics ggplot2.ipynb
-rw-r--r-- 1 jovyan 1000 158687 Jun 27 21:28 R Graphics Overview.ipynb
-rw-r--r-- 1 jovyan 1000 158687 Jul 9 12:39 R_Graphics_Overview.ipynb
-rw-r--r-- 1 jovyan 1000 106176 Jul 9 12:39 R_tidyverse_1.ipynb
-rw-r--r-- 1 jovyan 1000 82867 Jul 9 12:39 R_tidyverse_2.ipynb
-rw-r--r-- 1 jovyan 1000 126650 Jul 9 12:39 R_tidyverse_3.ipynb
-rw-r--r-- 1 jovyan 1000 29862 Jul 9 12:39 R_tidyyverse_Exercise.ipynb
-rw-r--r-- 1 jovyan 1000 7260 Jul 9 12:39 R_tidyyverse_Exercise_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 70 Jul 9 12:39 R_tidyyverse_Exercise_Solutions.r
-rw-r--r-- 1 jovyan users 76 Jul 9 15:22 stderr.txt
-rw-r--r-- 1 jovyan users 0 Jul 9 15:22 stdout.txt
-rw-r--r-- 1 jovyan 1000 53864 Jul 9 15:12 The_Unix_Shell_01___File_and_Directory_Management.ipynb
-rw-r--r-- 1 jovyan 1000 7306 Jul 9 15:25 The_Unix_Shell_02___Working_with_Text.ipynb
-rw-r--r-- 1 jovyan 1000 19041 Jul 9 15:35 The_Unix_Shell_03___Finding_Stuff.ipynb
-rw-r--r-- 1 jovyan 1000 16736 Jul 9 15:51 The_Unix_Shell_04___Regular_Expresssions.ipynb
-rw-r--r-- 1 jovyan 1000 13236 Jul 9 12:39 The_Unix_Shell_05___Shell_Scripts.ipynb
-rw-r--r-- 1 jovyan 1000 4928 Jul 9 12:39 The_Unix_Shell___Exercises.ipynb
-rw-r--r-- 1 jovyan 1000 9293 Jul 9 12:39 The_Unix_Shell___Exercises_Solutions.ipynb
Shown only directories¶
In [98]:
ls -d */
data/ data2/ figs/ ref/
Alternative using grep¶
In [99]:
ls -l | grep -E '^d'
drwxr-xr-x 2 jovyan users 4096 Jul 5 15:01 data
drwxr-xr-x 2 jovyan users 4096 Jul 9 14:59 data2
drwxr-xr-x 2 jovyan 1000 4096 Jul 5 15:01 figs
drwxr-xr-x 2 jovyan users 4096 Jul 9 15:00 ref
Show only files¶
In [100]:
ls -l | grep -Ev '^d'
total 4380
-rw-r--r-- 1 jovyan 1000 9683 Jul 9 12:39 Bash_Exercise_1.ipynb
-rw-r--r-- 1 jovyan 1000 15460 Jul 9 12:39 Bash_Exercise_1_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 1323 Jul 9 12:39 Bash_Exercise_1_Solutions.sh
-rw-r--r-- 1 jovyan 1000 6217 Jul 9 12:39 Bash_Exercise_2.ipynb
-rw-r--r-- 1 jovyan 1000 12757 Jul 9 12:39 Bash_Exercise_2_Solutoins.ipynb
-rw-r--r-- 1 jovyan 1000 1126 Jul 9 12:39 Bash_Exercise_2_Solutoins.sh
-rw-r--r-- 1 jovyan 1000 15448 Jul 1 21:42 Bash_Exercise_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 65260 Jun 27 21:28 Bash in Jupyter.ipynb
-rw-r--r-- 1 jovyan 1000 65260 Jul 9 12:39 Bash_in_Jupyter.ipynb
-rw-r--r-- 1 jovyan users 36489 Jul 9 14:29 Bash_tutorial-Copy1.ipynb
-rw-r--r-- 1 jovyan 1000 5967 Jul 9 12:39 Bash_tutorial.ipynb
-rw-r--r-- 1 jovyan 1000 29522 Jul 9 12:39 Bash_tutorial_prep.ipynb
-rw-r--r-- 1 jovyan users 45 Jul 9 15:15 hello.txt
-rw-r--r-- 1 jovyan users 24 Jul 9 15:17 nursery.txt
-rw-r--r-- 1 jovyan 1000 12634 Jul 9 12:39 Process-RNA-seq-counts.ipynb
-rw-r--r-- 1 jovyan 1000 1398902 Jul 9 12:39 R_Graphic_ ggplot2.ipynb
-rw-r--r-- 1 jovyan 1000 155396 Jun 27 21:28 R Graphics Base.ipynb
-rw-r--r-- 1 jovyan 1000 155396 Jul 9 12:39 R_Graphics_Base.ipynb
-rw-r--r-- 1 jovyan 1000 10670 Jul 9 12:39 R_Graphics_Exercise.ipynb
-rw-r--r-- 1 jovyan 1000 195947 Jul 9 12:39 R_Graphics_Exercise_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 938 Jul 9 12:39 R_Graphics_Exercise_Solutions.r
-rw-r--r-- 1 jovyan 1000 1398902 Jun 27 21:28 R Graphics ggplot2.ipynb
-rw-r--r-- 1 jovyan 1000 158687 Jun 27 21:28 R Graphics Overview.ipynb
-rw-r--r-- 1 jovyan 1000 158687 Jul 9 12:39 R_Graphics_Overview.ipynb
-rw-r--r-- 1 jovyan 1000 106176 Jul 9 12:39 R_tidyverse_1.ipynb
-rw-r--r-- 1 jovyan 1000 82867 Jul 9 12:39 R_tidyverse_2.ipynb
-rw-r--r-- 1 jovyan 1000 126650 Jul 9 12:39 R_tidyverse_3.ipynb
-rw-r--r-- 1 jovyan 1000 29862 Jul 9 12:39 R_tidyyverse_Exercise.ipynb
-rw-r--r-- 1 jovyan 1000 7260 Jul 9 12:39 R_tidyyverse_Exercise_Solutions.ipynb
-rw-r--r-- 1 jovyan 1000 70 Jul 9 12:39 R_tidyyverse_Exercise_Solutions.r
-rw-r--r-- 1 jovyan users 76 Jul 9 15:22 stderr.txt
-rw-r--r-- 1 jovyan users 0 Jul 9 15:22 stdout.txt
-rw-r--r-- 1 jovyan 1000 53864 Jul 9 15:12 The_Unix_Shell_01___File_and_Directory_Management.ipynb
-rw-r--r-- 1 jovyan 1000 7306 Jul 9 15:25 The_Unix_Shell_02___Working_with_Text.ipynb
-rw-r--r-- 1 jovyan 1000 19041 Jul 9 15:35 The_Unix_Shell_03___Finding_Stuff.ipynb
-rw-r--r-- 1 jovyan 1000 16736 Jul 9 15:51 The_Unix_Shell_04___Regular_Expresssions.ipynb
-rw-r--r-- 1 jovyan 1000 13236 Jul 9 12:39 The_Unix_Shell_05___Shell_Scripts.ipynb
-rw-r--r-- 1 jovyan 1000 4928 Jul 9 12:39 The_Unix_Shell___Exercises.ipynb
-rw-r--r-- 1 jovyan 1000 9293 Jul 9 12:39 The_Unix_Shell___Exercises_Solutions.ipynb
Sort by last modified time¶
In [101]:
ls -lt | head -3
total 4380
-rw-r--r-- 1 jovyan 1000 16736 Jul 9 15:51 The_Unix_Shell_04___Regular_Expresssions.ipynb
-rw-r--r-- 1 jovyan 1000 19041 Jul 9 15:35 The_Unix_Shell_03___Finding_Stuff.ipynb
Human readable output¶
In [102]:
ls -lth | head -3
total 4.3M
-rw-r--r-- 1 jovyan 1000 17K Jul 9 15:51 The_Unix_Shell_04___Regular_Expresssions.ipynb
-rw-r--r-- 1 jovyan 1000 19K Jul 9 15:35 The_Unix_Shell_03___Finding_Stuff.ipynb
Recursive listing¶
In [103]:
ls -R
.:
Bash_Exercise_1.ipynb
Bash_Exercise_1_Solutions.ipynb
Bash_Exercise_1_Solutions.sh
Bash_Exercise_2.ipynb
Bash_Exercise_2_Solutoins.ipynb
Bash_Exercise_2_Solutoins.sh
Bash_Exercise_Solutions.ipynb
Bash in Jupyter.ipynb
Bash_in_Jupyter.ipynb
Bash_tutorial-Copy1.ipynb
Bash_tutorial.ipynb
Bash_tutorial_prep.ipynb
data
data2
figs
hello.txt
nursery.txt
Process-RNA-seq-counts.ipynb
ref
R_Graphic_ ggplot2.ipynb
R Graphics Base.ipynb
R_Graphics_Base.ipynb
R_Graphics_Exercise.ipynb
R_Graphics_Exercise_Solutions.ipynb
R_Graphics_Exercise_Solutions.r
R Graphics ggplot2.ipynb
R Graphics Overview.ipynb
R_Graphics_Overview.ipynb
R_tidyverse_1.ipynb
R_tidyverse_2.ipynb
R_tidyverse_3.ipynb
R_tidyyverse_Exercise.ipynb
R_tidyyverse_Exercise_Solutions.ipynb
R_tidyyverse_Exercise_Solutions.r
stderr.txt
stdout.txt
The_Unix_Shell_01___File_and_Directory_Management.ipynb
The_Unix_Shell_02___Working_with_Text.ipynb
The_Unix_Shell_03___Finding_Stuff.ipynb
The_Unix_Shell_04___Regular_Expresssions.ipynb
The_Unix_Shell_05___Shell_Scripts.ipynb
The_Unix_Shell___Exercises.ipynb
The_Unix_Shell___Exercises_Solutions.ipynb
./data:
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
./data2:
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
./figs:
fig1.png fig2.png fig3.png fig4.png fig5.png
./ref:
cn.txt foo_Copy.ipynb
Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf header.txt
Globbing¶
The use of wild cards to specify Unix paths is known as globbing.
*
represets any number of characters¶
In [104]:
ls *ipynb
Bash_Exercise_1.ipynb
Bash_Exercise_1_Solutions.ipynb
Bash_Exercise_2.ipynb
Bash_Exercise_2_Solutoins.ipynb
Bash_Exercise_Solutions.ipynb
Bash in Jupyter.ipynb
Bash_in_Jupyter.ipynb
Bash_tutorial-Copy1.ipynb
Bash_tutorial.ipynb
Bash_tutorial_prep.ipynb
Process-RNA-seq-counts.ipynb
R_Graphic_ ggplot2.ipynb
R Graphics Base.ipynb
R_Graphics_Base.ipynb
R_Graphics_Exercise.ipynb
R_Graphics_Exercise_Solutions.ipynb
R Graphics ggplot2.ipynb
R Graphics Overview.ipynb
R_Graphics_Overview.ipynb
R_tidyverse_1.ipynb
R_tidyverse_2.ipynb
R_tidyverse_3.ipynb
R_tidyyverse_Exercise.ipynb
R_tidyyverse_Exercise_Solutions.ipynb
The_Unix_Shell_01___File_and_Directory_Management.ipynb
The_Unix_Shell_02___Working_with_Text.ipynb
The_Unix_Shell_03___Finding_Stuff.ipynb
The_Unix_Shell_04___Regular_Expresssions.ipynb
The_Unix_Shell_05___Shell_Scripts.ipynb
The_Unix_Shell___Exercises.ipynb
The_Unix_Shell___Exercises_Solutions.ipynb
In [105]:
ls *Text*
The_Unix_Shell_02___Working_with_Text.ipynb
?
represents exactly one character¶
In [107]:
ls data
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
In [109]:
ls data/unc_genes_v?.tsv
data/unc_genes_v1.tsv data/unc_genes_v2.tsv
Character sets¶
[abc]
represents a or b or c- [a-z] represents any lower case character
!
negates
In [110]:
ls data
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
In [111]:
ls data/[a-d]*
data/duke_demographics.tsv data/duke_proteins_v1.tsv
data/duke_genes_v1.tsv data/duke_proteins_v2.tsv
data/duke_genes_v2.tsv
In [112]:
ls data/[!a-d]*
data/gene_counts.txt data/unc_genes_v1.tsv data/unc_proteins_v1.tsv
data/unc_demographics.tsv data/unc_genes_v2.tsv data/unc_proteins_v2.tsv
Making and removing new directories¶
In [140]:
mkdir foo
In [141]:
ls -d */
data/ data2/ figs/ foo/ ref/
Making intermediate directories automatically¶
In [142]:
mkdir a/b/c/d
mkdir: cannot create directory ‘a/b/c/d’: No such file or directory
In [143]:
mkdir -p a/b/c/d
In [144]:
ls -R a
a:
b
a/b:
c
a/b/c:
d
a/b/c/d:
Deleting directories¶
In [145]:
rmdir foo
Only works if directory is empty¶
The | cat
part is not necessary on the command line, but is only
used here for convenience of Run All Cells as Jupyter stops on non-zero
exit codes. The | cat
syntax “pipes” the output of rmdir data
to
a the cat
program.
In [146]:
rmdir data | cat
rmdir: failed to remove ‘data’: Directory not empty
Recursive intermediate directories as well¶
In [147]:
rmdir -p a/b/c/d
In [148]:
ls -d */
data/ data2/ figs/ ref/
Working with files¶
Making an empty file¶
In [149]:
touch foo.txt
In [150]:
ls *txt
foo.txt hello.txt nursery.txt stderr.txt stdout.txt
Viewing a file¶
In [153]:
ls data
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
In [154]:
cat date/duke_genes | head
cat: date/duke_genes: No such file or directory
In [155]:
head -n 3 ref/header.txt
#!genome-build CNA3
#!genome-version CNA3
#!genome-date 2015-11
In [156]:
tail -n 3 ref/header.txt
Mt ena CDS 24096 24848 . + 0 gene_id "CNAG_09012"; transcript_id "AFR99114"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "AFR99114"; protein_version "1";
Mt ena start_codon 24096 24098 . + 0 gene_id "CNAG_09012"; transcript_id "AFR99114"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding";
Mt ena stop_codon 24849 24851 . + 0 gene_id "CNAG_09012"; transcript_id "AFR99114"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding";
Can start tail form a specified line number with (+)¶
In [157]:
tail -n +4 ref/header.txt | head
#!genome-build-accession GCA_000149245.3
#!genebuild-last-updated 2015-11
1 ena gene 100 5645 . - . gene_id "CNAG_04548"; gene_source "ena"; gene_biotype "protein_coding";
1 ena transcript 100 5645 . - . gene_id "CNAG_04548"; transcript_id "AFR92135"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding";
1 ena exon 5494 5645 . - . gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "AFR92135-1";
1 ena CDS 5494 5645 . - 0 gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "AFR92135"; protein_version "1";
1 ena start_codon 5643 5645 . - 0 gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding";
1 ena exon 5322 5422 . - . gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "2"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "AFR92135-2";
1 ena CDS 5322 5422 . - 1 gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "2"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "AFR92135"; protein_version "1";
1 ena exon 3958 5263 . - . gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "3"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "AFR92135-3";
tail: error writing ‘standard output’: Broken pipe
Copying and moving files¶
In [158]:
ls
Bash_Exercise_1.ipynb
Bash_Exercise_1_Solutions.ipynb
Bash_Exercise_1_Solutions.sh
Bash_Exercise_2.ipynb
Bash_Exercise_2_Solutoins.ipynb
Bash_Exercise_2_Solutoins.sh
Bash_Exercise_Solutions.ipynb
Bash in Jupyter.ipynb
Bash_in_Jupyter.ipynb
Bash_tutorial-Copy1.ipynb
Bash_tutorial.ipynb
Bash_tutorial_prep.ipynb
data
data2
figs
hello.txt
nursery.txt
Process-RNA-seq-counts.ipynb
ref
R_Graphic_ ggplot2.ipynb
R Graphics Base.ipynb
R_Graphics_Base.ipynb
R_Graphics_Exercise.ipynb
R_Graphics_Exercise_Solutions.ipynb
R_Graphics_Exercise_Solutions.r
R Graphics ggplot2.ipynb
R Graphics Overview.ipynb
R_Graphics_Overview.ipynb
R_tidyverse_1.ipynb
R_tidyverse_2.ipynb
R_tidyverse_3.ipynb
R_tidyyverse_Exercise.ipynb
R_tidyyverse_Exercise_Solutions.ipynb
R_tidyyverse_Exercise_Solutions.r
stderr.txt
stdout.txt
The_Unix_Shell_01___File_and_Directory_Management.ipynb
The_Unix_Shell_02___Working_with_Text.ipynb
The_Unix_Shell_03___Finding_Stuff.ipynb
The_Unix_Shell_04___Regular_Expresssions.ipynb
The_Unix_Shell_05___Shell_Scripts.ipynb
The_Unix_Shell___Exercises.ipynb
The_Unix_Shell___Exercises_Solutions.ipynb
Copying files¶
In [159]:
cp "The_Unix_Shell_01___File_and_Directory_Management.ipynb" foo.ipynb
In [160]:
ls f*ipynb
foo.ipynb
Copying directories (Recursive copy)¶
In [161]:
cp -R data data2
In [162]:
ls -R data2
data2:
data duke_proteins_v1.tsv unc_genes_v1.tsv
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
data2/data:
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
Move a file to a new location¶
In [165]:
mv foo_Copy.ipynb ref/
In [166]:
ls ref
cn.txt foo_Copy.ipynb
Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf header.txt
File compression and archival¶
Combine multiple files into single file¶
In [167]:
ls data
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
In [168]:
man tar | head -n 20
bash: man: command not found
In [169]:
tar -cvf data.tar data
data/
data/unc_proteins_v2.tsv
data/duke_genes_v1.tsv
data/duke_demographics.tsv
data/unc_genes_v2.tsv
data/gene_counts.txt
data/unc_proteins_v1.tsv
data/duke_proteins_v2.tsv
data/unc_demographics.tsv
data/unc_genes_v1.tsv
data/duke_proteins_v1.tsv
data/duke_genes_v2.tsv
In [170]:
ls *tar
data.tar
In [171]:
ls -d */
data/ data2/ figs/ ref/
In [172]:
rm -rf data/
In [173]:
ls -d */
data2/ figs/ ref/
Recover original files¶
In [178]:
tar -xvf data.tar
data/
data/unc_proteins_v2.tsv
data/duke_genes_v1.tsv
data/duke_demographics.tsv
data/unc_genes_v2.tsv
data/gene_counts.txt
data/unc_proteins_v1.tsv
data/duke_proteins_v2.tsv
data/unc_demographics.tsv
data/unc_genes_v1.tsv
data/duke_proteins_v1.tsv
data/duke_genes_v2.tsv
In [179]:
ls data/
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
In [180]:
rm data.tar
Concatenate and compress¶
In [181]:
tar -cvzf data.tar.gz data
data/
data/unc_proteins_v2.tsv
data/duke_genes_v1.tsv
data/duke_demographics.tsv
data/unc_genes_v2.tsv
data/gene_counts.txt
data/unc_proteins_v1.tsv
data/duke_proteins_v2.tsv
data/unc_demographics.tsv
data/unc_genes_v1.tsv
data/duke_proteins_v1.tsv
data/duke_genes_v2.tsv
In [182]:
rm -rf data/
In [183]:
ls data*
data.tar.gz
data2:
data duke_proteins_v1.tsv unc_genes_v1.tsv
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
Uncompress and recover¶
In [184]:
tar -xvzf data.tar.gz
data/
data/unc_proteins_v2.tsv
data/duke_genes_v1.tsv
data/duke_demographics.tsv
data/unc_genes_v2.tsv
data/gene_counts.txt
data/unc_proteins_v1.tsv
data/duke_proteins_v2.tsv
data/unc_demographics.tsv
data/unc_genes_v1.tsv
data/duke_proteins_v1.tsv
data/duke_genes_v2.tsv
In [185]:
ls data*
data.tar.gz
data:
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
duke_proteins_v1.tsv unc_genes_v1.tsv
data2:
data duke_proteins_v1.tsv unc_genes_v1.tsv
duke_demographics.tsv duke_proteins_v2.tsv unc_genes_v2.tsv
duke_genes_v1.tsv gene_counts.txt unc_proteins_v1.tsv
duke_genes_v2.tsv unc_demographics.tsv unc_proteins_v2.tsv
In [186]:
rm data.tar.gz
Checksums¶
When working with genomic data, we deal with very large files. There is a small risk that these files will be corrupted over time or during data transfer. To ensure that files are not changed, we use a “checksum” function. This is a function that generates an long, essentially random number called a checksum that represents the contents of the file. When the file contents change, so will the checksum. In theory, there is a very small probability that two different files generate the same checksum, but in practice the probability is too small to worry about.
There are several different algorithms for generating the checksums, and at least 3 Unix commands to do so, but they all work very similarly for our purposes.
The strategy is:
- Generate and store a checksum together with a data file whose integrity you care about
- When you use or download the data, re-generate the checksum (using the same algorithm e.g. MD5) and compare with the checksum
In [193]:
cat << EOF > hello.txt
1 Hello, bash
2 Hello, again
3 Hello
4 again
EOF
In [194]:
cat hello.txt
1 Hello, bash
2 Hello, again
3 Hello
4 again
In [195]:
cksum hello.txt
1567754519 45 hello.txt
In [196]:
md5sum hello.txt
a68554400613f5445c13c57907e976ed hello.txt
In [197]:
sha1sum hello.txt
57eae725420bf0075d17f849cc8e75379bea6eb6 hello.txt
If we alter hello.txt in any way the checksum will be different¶
In [198]:
cat hello.txt
1 Hello, bash
2 Hello, again
3 Hello
4 again
In [199]:
md5sum hello.txt > hello.md5
In [200]:
cat hello.md5
a68554400613f5445c13c57907e976ed hello.txt
Now make a small change to hello.txt
In [201]:
cat > test1.txt << EOF
One, two buckle my shoe
Three, four lock the door
EOF
In [202]:
cat > hello.txt << EOF
1 Hello, bash
2 Hella, again
3 Hello
4 again
EOF
In [203]:
cat hello.txt
1 Hello, bash
2 Hella, again
3 Hello
4 again
In [204]:
md5sum hello.txt
0d8c8172f2a69f5845f21cb03a436be3 hello.txt
In [205]:
md5sum -c hello.md5
hello.txt: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
Restore original text¶
In [206]:
cat > hello.txt << EOF
1 Hello, bash
2 Hello, again
3 Hello
4 again
EOF
In [207]:
md5sum hello.txt > test.md5
In [208]:
md5sum -c hello.md5
hello.txt: OK
Checksums for multiple files¶
In [209]:
echo "aaaaa" > a.txt
echo "bbbbb" > b.txt
echo "ccccc" > c.txt
Generate md5 checksum file¶
In [210]:
md5sum a.txt b.txt c.txt > MD5_CHECKSUM
In [211]:
cat MD5_CHECKSUM
4c850c5b3b2756e67a91bad8e046ddac a.txt
369d9bb6f2313be57f7a55502eb420ba b.txt
34d9ae3c9b1fa64d91bdb00f3c0d6cd5 c.txt
Modify one file¶
In [212]:
echo "bbcbb" > b.txt
Check file integrity for all files¶
In [213]:
md5sum -c MD5_CHECKSUM
a.txt: OK
b.txt: FAILED
c.txt: OK
md5sum: WARNING: 1 computed checksum did NOT match