STAR: Pipeline

Synposis

This notebook will outline the steps using the STAR pipeline for pre-processing RNA-Seq reads.

Set program names

In [1]:
mystar=/opt/NGS/STAR/STAR-2.5.2b/bin/Linux_x86_64_static/STAR

Set reference and annotation files

In [2]:
myrefdir="/data1/workspace/tmp/STAR/reference"
mystarindex="/data1/workspace/tmp/STAR/index"
myfasta=$myrefdir"genome.fasta"
mygtf=$myrefdir"genome.gtf"

Create STAR Index

The first step is to produce the STAR index. Note that in this illustration up to 16 cores will be used. You need at least 32GB of RAM to process a large genome.

In [3]:
time $mystar \
    --runMode genomeGenerate \
    --genomeDir $mystarindex \
    --sjdbGTFfile $mygtf \
    --genomeFastaFiles $myfasta \
    --runThreadN 16
bash: /opt/NGS/STAR/STAR-2.5.2b/bin/Linux_x86_64_static/STAR: No such file or directory

real    0m0.002s
user    0m0.000s
sys     0m0.001s

Process a paired end read sample: From reads to counts

The following process a

In [4]:

Set file names for sample

In [5]:
mysample="SAMPLE-12345"
R1=$mysample"-R1.fastq"
R1=$mysample"-R2.fastq"

Now align (R1,R2) to the reference index and count according to the genes in the gtf file

In [6]:
time $mystar \
    --twopassMode Basic \
    --quantMode GeneCounts \
    --genomeDir $mystarindex \
    --sjdbGTFfile $mygtf \
    --genomeFastaFiles $myfasta \
    --runThreadN 16 \
    --readFilesIn $R1 $R2 \
    --outFileNamePrefix $mysample
bash: /opt/NGS/STAR/STAR-2.5.2b/bin/Linux_x86_64_static/STAR: No such file or directory

real    0m0.001s
user    0m0.000s
sys     0m0.001s

For additional information, see the STAR manual

http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf