STAR: Pipeline¶
Synposis¶
This notebook will outline the steps using the STAR pipeline for pre-processing RNA-Seq reads.
Set program names¶
In [1]:
mystar=/opt/NGS/STAR/STAR-2.5.2b/bin/Linux_x86_64_static/STAR
Set reference and annotation files¶
In [2]:
myrefdir="/data1/workspace/tmp/STAR/reference"
mystarindex="/data1/workspace/tmp/STAR/index"
myfasta=$myrefdir"genome.fasta"
mygtf=$myrefdir"genome.gtf"
Create STAR Index¶
The first step is to produce the STAR index. Note that in this illustration up to 16 cores will be used. You need at least 32GB of RAM to process a large genome.
In [3]:
time $mystar \
--runMode genomeGenerate \
--genomeDir $mystarindex \
--sjdbGTFfile $mygtf \
--genomeFastaFiles $myfasta \
--runThreadN 16
bash: /opt/NGS/STAR/STAR-2.5.2b/bin/Linux_x86_64_static/STAR: No such file or directory
real 0m0.002s
user 0m0.000s
sys 0m0.001s
Process a paired end read sample: From reads to counts¶
The following process a
In [4]:
Set file names for sample
In [5]:
mysample="SAMPLE-12345"
R1=$mysample"-R1.fastq"
R1=$mysample"-R2.fastq"
Now align (R1,R2) to the reference index and count according to the genes in the gtf file
In [6]:
time $mystar \
--twopassMode Basic \
--quantMode GeneCounts \
--genomeDir $mystarindex \
--sjdbGTFfile $mygtf \
--genomeFastaFiles $myfasta \
--runThreadN 16 \
--readFilesIn $R1 $R2 \
--outFileNamePrefix $mysample
bash: /opt/NGS/STAR/STAR-2.5.2b/bin/Linux_x86_64_static/STAR: No such file or directory
real 0m0.001s
user 0m0.000s
sys 0m0.001s
For additional information, see the STAR manual
http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf