Working with Paired-Reads

If we had paired-end read data, we would need to do things a little bit different at some of the steps.

Running fastq-mcf on Paired Data

It only takes two minor changes to run fastq-mcf on paired data, we need to tell it to also load the read 2 file, and also what to call the trimmed output from this file.

  1. neb_e7600_adapters.fasta
  2. 27_MA_P_S38_L002_R1_001.fastq.gz
  3. 27_MA_P_S38_L002_R2_001.fastq.gz : NEW for paired-data
  4. -q 20
  5. -x 0.5
  6. -o 27_MA_P_S38_L002_R1_001.trim.fastq.gz
  7. -o 27_MA_P_S38_L002_R2_001.trim.fastq.gz : NEW for paired-data

Like this:

fastq-mcf $MYINFO/neb_e7600_adapters.fasta \
    $RAW_FASTQS/27_MA_P_S38_L002_R1_001.fastq.gz \
    $RAW_FASTQS/27_MA_P_S38_L002_R2_001.fastq.gz \
    -q 20 -x 0.5 \
    -o $TRIMMED/27_MA_P_S38_L002_R1_001.trim.fastq.gz \
    -o $TRIMMED/27_MA_P_S38_L002_R2_001.trim.fastq.gz

Note: Now that, since we are now including the reverse reads, we
expect to see contamination with both adapters now

Running STAR on Paired Data

As with fastq-mcf, running STAR on Paired Data on requires a minor change: adding the R2 FASTQ file to the arguments for --readFilesIn and removing the “R1” from the --outFileNamePrefix, since the output will combine R1 and R2, like this:

STAR \
    --runMode alignReads \
    --twopassMode None \
    --genomeDir $GENOME_DIR \
    --readFilesIn $TRIMMED/27_MA_P_S38_L002_R1_001.trim.fastq.gz \
                  $TRIMMED/27_MA_P_S38_L002_R2_001.trim.fastq.gz \
    --readFilesCommand gunzip -c \
    --outFileNamePrefix ${STAR_OUT}/27_MA_P_S38_L002_ \
    --quantMode GeneCounts \
    --outSAMtype BAM Unsorted \
    --outSAMunmapped Within