Initialize Variables and Make Directories¶
In [1]:
source bioinf_intro_config.sh
mkdir -p $TRIMMED $MYINFO $GENOME_DIR $STAR_OUT
Make adapter file¶
In [2]:
echo ">Adapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
>AdapterRead2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
>Adapter_rc
TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>AdapterRead2_rc
ACACTCTTTCCCTACACGACGCTCTTCCGATCT" > $MYINFO/neb_e7600_adapters.fasta
Download and Index Genome¶
In [3]:
SHARED_URL="ftp://ftp.ensemblgenomes.org/pub/release-39/fungi"
FASTA=${GENOME_DIR}/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa
GTF=${GENOME_DIR}/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf
In [4]:
wget --no-verbose --directory-prefix ${GENOME_DIR} "${SHARED_URL}/gtf/fungi_basidiomycota1_collection/cryptococcus_neoformans_var_grubii_h99/$(basename $GTF).gz"
wget --no-verbose --directory-prefix ${GENOME_DIR} "${SHARED_URL}/fasta/fungi_basidiomycota1_collection/cryptococcus_neoformans_var_grubii_h99/dna/$(basename $FASTA).gz"
gunzip -f ${FASTA}.gz
gunzip -f ${GTF}.gz
2018-07-24 13:27:47 URL: ftp://ftp.ensemblgenomes.org/pub/release-39/fungi/gtf/fungi_basidiomycota1_collection/cryptococcus_neoformans_var_grubii_h99/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf.gz [1796344] -> "/home/jovyan/work/scratch/bioinf_intro/genome/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf.gz.1" [1]
2018-07-24 13:27:49 URL: ftp://ftp.ensemblgenomes.org/pub/release-39/fungi/fasta/fungi_basidiomycota1_collection/cryptococcus_neoformans_var_grubii_h99/dna/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa.gz [5922212] -> "/home/jovyan/work/scratch/bioinf_intro/genome/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa.gz.1" [1]
gzip: /home/jovyan/work/scratch/bioinf_intro/genome/Cryptococcus_neoformans_var_grubii_h99.CNA3.dna.toplevel.fa already exists; do you wish to overwrite (y or n)?
In [5]:
STAR \
--runMode genomeGenerate \
--genomeDir $GENOME_DIR \
--genomeFastaFiles ${FASTA} \
--sjdbGTFfile ${GTF} \
--outFileNamePrefix ${STAR_OUT}/genome_ \
--genomeSAindexNbases 11
# --genomeSAindexNbases 6
Jul 24 13:32:28 ..... started STAR run
Jul 24 13:32:28 ... starting to generate Genome files
Jul 24 13:32:29 ... starting to sort Suffix Array. This may take a long time...
Jul 24 13:32:29 ... sorting Suffix Array chunks and saving them to disk...
Jul 24 13:32:52 ... loading chunks from disk, packing SA...
Jul 24 13:32:52 ... finished generating suffix array
Jul 24 13:32:52 ... generating Suffix Array index
Jul 24 13:32:54 ... completed Suffix Array index
Jul 24 13:32:54 ..... processing annotations GTF
Jul 24 13:32:55 ..... inserting junctions into the genome indices
Jul 24 13:33:38 ... writing Genome to disk ...
Jul 24 13:33:38 ... writing Suffix Array to disk ...
Jul 24 13:33:39 ... writing SAindex to disk
Jul 24 13:33:39 ..... finished successfully