Introduction to FASTQ Files¶
Shell Variables¶
As before, we will use shell variables to make it easier to refer to the directories we are working with. The shell variables do not carry over between notebooks. Shell variables are specific to a shell session, and each notebook is a separate shell session.
So the first thing we will do is assign the variables in this notebook.
In [1]:
# We used these in the last notebook
CUROUT=$HOME/work/scratch/2015_output
DEMUX=$CUROUT/demux_fastqs
Looking at a FASTQ¶
Let’s take a quick look at our data. For our first pass at analysis, we are just going to be working with the first read data (R1) from one sample.
In [2]:
ls -lSrh $DEMUX
total 0
-rw-r--r-- 1 cliburn staff 0B Aug 8 09:09 dryrun_demux.stdout
Compression: gzip, zcat, etc
The ”.gz” at the end of the FASTQ file name indicates that the fastq
file was compressed using a program named gzip. This is pretty
common because FASTQ files can be huge. cat
is a program for
viewing text files, zcat
is a special version of this program
that lets you view compressed text files without first decompressing
them.
Here we will use: * zcat: to show the compress FASTQ * head: to grab only the first 10 lines, since the whole file has over 5 x 10^6 lines (which would almost certainly hang our web browser) * cut: to only show the first 60 characters of each line, to avoid confusion from line wrapping, since the reads are 101 bp
In [3]:
zcat $DEMUX/r1.8A_pilot.fq.gz | head | cut -c1-60
zcat: can't stat: /Users/cliburn/work/scratch/2015_output/demux_fastqs/r1.8A_pilot.fq.gz (/Users/cliburn/work/scratch/2015_output/demux_fastqs/r1.8A_pilot.fq.gz.Z): No such file or directory
less
less
is a program for taking an interactive look at a text file,
like a FASTQ - it let’s you scroll, search, etc. less
won’t work
in the bash notebook, if you want to try it out, you need to use a
terminal.
To switch to a terminal, click on the jupyter “File” menu, and select “Open”. A new browser window/tab should open, with your jupyter “home base”. Here, you should click on the “Files” tab if it is not already active, there click on “New” and select “Terminal”, which should open a new live terminal.
Since we want to look at a compressed (gzipped) FASTQ, we will use a
version of less
called zless
, which decompresses on the fly.
At the terminal’s command prompt, type (or paste)
cd $HOME/work/scratch/2015_output/demux_fastqs
Then type zless r1.8A_pilot.fq.gz
You should see the first few
lines of the file, notice that it looks like the examples we saw in
lecture.
zless
(and its standard cousin less
) can do a lot of things.
Here are a few important keystrokes:
- q : quit
- space : scroll down a page
- up/down arrow : scroll up/down by a line
What do quality scores mean?¶
See the Quality Scores notebook for a “translation” of quality scores. The Wikipedia article on FASTQs is also a useful resource.