Page tree
Skip to end of metadata
Go to start of metadata

Trimmomatic is used to remove low quality portions of reads, and to remove adapter sequences

Requirements

  1. Fastq formatted read files
  2. Adapter sequence file.  Some are provided with trinity, or you can use my illumina truseq file /home/mgribsko/reference/adapter_t.fa

 Notes

  1.  typically, run fastqc both before and after trimmomatic, see the automated 3. script below
  2. The recommended thresholds for trimming, which are specified in bits, should be much lower than suggested in the manual
    1. the conversion from bits to exact matches is Identities * 0.58, it must be integer so round up
    2. a typical command might be ILLUMINACLIP:adapter.fa:2:10:5 or ILLUMINACLIP:adapter.fa:2:12:6
      use my testadapt.pl program to estimate the correct values

 Torque/PBS Examples

  1.  Simple script.  Very tedious since you must edit it for each input file.  You will make mistakes.  Consider using one or the more automated scripts below.

    Simple
    #/bin/bash
    #PBS -N trimmomatic
    #PBS -q standby
    #PBS -l nodes=1:ppn=20
    #PBS -l walltime=4:00:00
    #PBS -l epilogue=/home/mgribsko/jobs/epilogue.sh
    #PBS -l naccesspolicy=shared
    
    cd $PBS_O_WORKDIR
    module load bioinfo
    module load fastqc
    module load trimmomatic
    
    # this is the trimming command definition.  Each command executed
    # in the order given.  Adapter trimming should go first, if used
    trimmer="ILLUMINACLIP:adapter.fa:2:12:6 \
    LEADING:10 \
    TRAILING:10 \
    SLIDINGWINDOW:5:16 \
    MINLEN:30"
    
    ############################################################
    # pre-cleaning fastqc
    ############################################################
    mkdir fastqc_before		# fails if directory exists
    fastqc -q -t 20 -o fastqc_before *.fastq
    
    
    ############################################################
    # trimmomatic: repeat command for each file (tedious)
    ############################################################
    trimmomatic PE \
    clean/mroreri_mp_r1_paired.fq clean/mroreri_mp_r2_paired.fq \
    clean/mroreri_mp_r1_nx_paired clean/mroreri_mp_r1_nx_unpaired \
    clean/mroreri_mp_r2_nx_paired clean/mroreri_mp_r2_nx_unpaired \
    $trimmer
    
    ############################################################
    # post-cleaning - fastqc
    ############################################################
    mkdir fastqc_after		# fails if directory exists
    fastqc -q -t 20 -o fastqc_clean *.paired.fastq
    
    
  2. Run on a manually constructed list of files.  Only R1 files need to be listed, the R2 file names are generated automatically.

    Multiple files using unix shell loop
    #!/bin/bash
    #PBS -N trimmomatic
    #PBS -q standby
    #PBS -l nodes=1:ppn=20
    #PBS -l walltime=4:00:00
    #PBS -l epilogue=/home/mgribsko/jobs/epilogue.sh
    
    cd $PBS_O_WORKDIR
    module load bioinfo
    module load fastqc
    module load trimmomatic
     
    # this is the list of file names for the R1 samples. The R2 filenames are generated below, so all I need are the R1s
    # It is a space delimited list inside quotes, uses a continuation at the end of the first line, '\'
    
    sample="021938_JD-6_ATCACG_run492_L007_R1_001.fastq 021939_JD-8_CGATGT_run492_L007_R1_001.fastq 021940_JD-5_TTAGGC_run492_L007_R1_001.fastq \
    021941_MCA-2504_TGACCA_run492_L007_R1_001.fastq 021942_MCA-2952_ACAGTG_run492_L007_R1_001.fastq 021943_MCA-2974_GCCAAT_run492_L007_R1_001.fastq"
    
    # the trimmomatic trimming command. it's defined here only for convenience
    trimmer="ILLUMINACLIP:adapter.fa:2:12:6 LEADING:10 TRAILING:10 SLIDINGWINDOW:5:16 MINLEN:30 "
    
    # my datafiles are in a different directory so I define a path and store it in the variable $data
    data='../../clean/'
    
    
    ############################################################
    # pre-cleaning - fastqc
    ############################################################
    mkdir fastqc_before
    fastqc -q -t 20 -o fastqc_before *.fastq
    
    # loop over all the file names in $sample
    for r1 in $sample; do
         # copy the filename, r1, to a new file name, r2 and substitute R1 in the name with R2
         # this generates the name of the R2 file
         r2=$r1
         r2="${r1/R1/R2}"
         #echo file: $dir$r1 $dir$r2
    
         # generate the names for the four output files, R1.unpaired, R1.paired, R2.unpaired, and R2.paired 
         # from the names of the R1 and R2 input files
         # notice I skipped copying the name of r1 into r1p, and just substituted .fastq to .fastq.paired and put the result in a new variable
         r1p="${r1/.fastq/.paired.fastq}"
         r1u="${r1/.fastq/.unpaired.fastq}"
         r2p="${r2/.fastq/.paired.fastq}"
         r2u="${r2/.fastq/.unpaired.fastq}"
    
        # run the trimmomatic command, note the path to the datafiles, $data
        trimmomatic PE -threads 20 $data/$r1 $data/$r2 $r1p $r1u $r2p $r2u $trimmer
    done
    
    # this is the end of the loop
    
    ############################################################
    # post-cleaning - fastqc
    ############################################################
    mkdir fastqc_after
    fastqc -q -t 20 -o fastqc_after *.fastq
    
  3. Highly automated:  all you need to enter is the directory in which the data is found, and some information about the naming scheme.  For instance, if my files are name SRR12345_R1.fastq and SRR12345_R2.fastq then r1tag="_R1", r2tag="_R2", and fastqsuffix=".fastq". If my files are name wyz.1.fq and wyz.2.fq then r1tag=".1", r2tag=".2", and fastqsuffix=".fq".

    Automated PE reads - FastQC/Trimmomatic/FastQC, SLURM
    #!/bin/bash
    #SBATCH --job-name trimmomatic
    #SBATCH --output=%x_%j.out
    #SBATCH --error=%x_%j.err
    #SBATCH --account scholar
    #SBATCH --nodes=1
    #SBATCH --ntasks=20
    #SBATCH --time=4:00:00
    
    ################################################################################
    # Trimmomatic
    # Quality trim Illumina paired end reads, and remove adapter sequences.
    # This script generates the names of the R2 files from the names of the R1 files
    # so only the R1 files need to be provided
    #
    # usage
    #    sbatch trimmomatic.job           # run on cluster
    #    trimmomatic.job                  # write out commands for examination
    ################################################################################
    data="../dna_raw"                     # directory where raw data files are found
    r1tag="_1"                            # string that identifies read 1 files
    r2tag="_2"                            # string that identifies the read 2 files
    fastqsuffix=".fastq"                  # terminal part of fastq filenames
    target="$data/*$r1tag$fastqsuffix"    # search term for fastq files
    trimmed="trimmed3"                    # directory for the trimmed files
    
    runfqc_before=false                   # change to false to skip before trimming QC
    runfqc_after=true                     # change to false to skip after trimming QC
    fqc_before_dir="fastqc_before"        # directory for fastqc before
    fqc_after_dir="fastqc_after3"         # directory for fastqc after
    
    # trimming command definition.  Each command is executed in the order given.
    # Adapter trimming should go first, if used
    
    adapter=/group/bioinfo/apps/apps/trimmomatic-0.39/adapters/TruSeq2-PE.fa        # adapter file
    
    trimmer="ILLUMINACLIP:$adapter:2:14:9 \
    LEADING:10 \
    TRAILING:10 \
    SLIDINGWINDOW:5:16 \
    MINLEN:30"
    
    #-------------------------------------------------------------------------------
    # parameter report
    #-------------------------------------------------------------------------------
    echo "Trimmomatic"
    echo -e "\tdata directory: $data/"
    echo -e "\tR1/R2 tags: $r1tag/$r2tag"
    echo -e "\tfastq file suffix: $fastqsuffix"
    echo -e "\ttarget: $target"
    echo -e "\toutput directory: $trimmed/"
    if $runfqc_before; then
        echo -e "\tFastQC before trimming: $fqc_before_dir"
    fi
    if $runfqc_after; then
        echo -e "\tFastQC after trimming: $fqc_after_dir"
    fi
    echo -e "\tadapter file: $adapter"
    echo -e "\ttrimming command: $trimmer"
    
    #-------------------------------------------------------------------------------
    # check for debug mode
    #-------------------------------------------------------------------------------
    debug=true
    if [ $SLURM_SUBMIT_HOST ]; then
        debug=false
        echo -e "\n#-------------------------------------------------------------------------------"
        echo -e "# Debug mode: commands will not be executed"
        echo -e "#-------------------------------------------------------------------------------"
    else
        echo -e "\n#-------------------------------------------------------------------------------"
        echo -e "# Executing script"
        echo -e "#-------------------------------------------------------------------------------"
    fi
    
    ################################################################################
    # begin script; hopefully nothing below here needs to be changed
    ################################################################################
    if [ $SLURM_SUBMIT_HOST ]; then
        # execute only if qsub
        echo "working directory: $SLURM_SUBMIT_DIR"
        cd $SLURM_SUBMIT_DIR
    
        echo "loading bioinformatics modules"
        module load bioinfo
        module load fastqc
        module load trimmomatic
    fi
    
    #-------------------------------------------------------------------------------
    # pre-cleaning - fastqc
    #-------------------------------------------------------------------------------
    if $runfqc_before; then
        echo -e "\nFastQC before trimming"
        echo -e "\t`date`"
        com0="fastqc -q -t 20 -o $fqc_before_dir $data/*$fastqsuffix"
        if [ -e "$fqc_before_dir" ]; then
            echo -e "\t$fqc_before_dir/ already exists"
            echo -e "\t$com0"
        else
            echo -e "\tCreating directory $fqc_before_dir/"
            echo -e "\t$com0"
            if ! $debug; then
                mkdir $fqc_before_dir
                $com0
            fi
        fi
    else
        echo -e "\nFastQC before trimming - skipped"
    fi
    
    #-------------------------------------------------------------------------------
    # Trimmomatic
    # input is all files that match the target: $data/*$r1tag$fastqsuffix
    #-------------------------------------------------------------------------------
    echo -e "\nTrimmomatic"
    echo -e "\t`date`"
    
    # create output directory, if needed
    if [ -e "$trimmed" ]; then
        echo -e "\tOutput directory $trimmed/ already exists"
    else
        echo -e "\tCreating output directory $trimmed/"
        if ! $debug; then
            mkdir $trimmed
        fi
    fi
    if ! $debug; then
        date >> $trimmed/trimming_parameters.txt
        echo $trimmer >> $trimmed/trimming_parameters.txt
    else
        date
        echo -e "trimming parameters will be written to $trimmed/trimming_parameters.txt\n"
    fi
    
    # run trimmomatic on each set of reads
    for r1 in $target; do
        # for each read 1, generate the read 2 name be replacing $r1tag with $r2tag
        # generate the paired and unpaired output file names by replacing
        # $fastqsuffix with paired$fastqsuffix or unpaired.$fastqsuffix in both
        # read 1 and read 2 names
    
        r2="${r1/$r1tag/$r2tag}"
        r1p="${r1/$fastqsuffix/.paired$fastqsuffix}"
        r1u="${r1/$fastqsuffix/.unpaired$fastqsuffix}"
        r2p="${r2/$fastqsuffix/.paired$fastqsuffix}"
        r2u="${r2/$fastqsuffix/.unpaired$fastqsuffix}"
    
        # remove data directory from output files
        r1p=$trimmed/"${r1p/$data\//}"
        r1u=$trimmed/"${r1u/$data\//}"
        r2p=$trimmed/"${r2p/$data\//}"
        r2u=$trimmed/"${r2u/$data\//}"
    
        com1="trimmomatic PE \
        -threads 20 \
        $r1 $r2 \
        $r1p $r1u \
        $r2p $r2u \
        $trimmer"
        #com1=${com1//     / }
        com1out=${com1//     /\\n\\t    }
        echo -e "\n\t$com1out"
    
        if ! $debug; then
            $com1
        fi
    
    done
    
    wait
    
    #-------------------------------------------------------------------------------
    # post-cleaning - fastqc
    #-------------------------------------------------------------------------------
    if $runfqc_after; then
        echo -e "\nFastQC after trimming"
        echo -e "\t`date`"
        com2="fastqc -q -t 20 -o $fqc_after_dir $trimmed/*.paired$fastqsuffix"
        if [ -e "$fqc_after_dir" ]; then
            echo -e "\t$fqc_after_dir/ already exists"
            echo -e "\t$com2"
        else
            echo -e "\tCreating directory $fqc_after_dir/"
            echo -e "\t$com2"
            if ! $debug; then
                mkdir $fqc_after_dir
                $com2
            fi
        fi
    else
        echo -e "\nFastQC after trimming - skipped"
    fi
    
    echo -e "\n`date`"
    
    
    Automated SE reads short RNA, SLURM
    #!/bin/bash
    #SBATCH --job-name trimmomatic
    #SBATCH --output=%x_%j.out
    #SBATCH --error=%x_%j.err
    #SBATCH --account standby
    #SBATCH --nodes=1
    #SBATCH --ntasks=20
    #SBATCH --time=4:00:00
    
    ################################################################################
    # Trimmomatic_single
    # Quality trim Illumina single end reads, and remove adapter sequences.
    #
    # usage
    #    sbatch trimmomatic_single.slurm      # run on cluster
    #    trimmomatic.slurm                    # write out commands for examination
    ################################################################################
    data=.                         # directory where raw data files are found
    trimmed=trimmed                # directory for the trimmed files
    adapter=adapter.fa             # adapter sequence file
    beforesuffix=_raw.fq           # terminal part of input fastq filenames
    aftersuffix=_trimmed.fq        # terminal part of output fastq filenames (trimmed files)
    target=$data/*$beforesuffix    # search term for input fastq files
    
    
    runfqc_before=false                   # change to false to skip before trimming QC
    runfqc_after=true                     # change to false to skip after trimming QC
    fqc_before_dir="fastqc_before"        # directory for fastqc before trimming QC
    fqc_after_dir="fastqc_after"          # directory for fastqc after trimming QC
    
    # trimming command definition.  Each command is executed in the order given.
    # Adapter trimming should go first, if used
    # note this is an extremely basic version, normally you would include:
    #	LEADING:10 \
    #	TRAILING:10 \
    #	SLIDINGWINDOW:5:16 \
    #	MINLEN:30"
    trimmer="ILLUMINACLIP:$adapter:2:14:7 \
    MINLEN:12"
    #-------------------------------------------------------------------------------
    # parameter report
    #-------------------------------------------------------------------------------
    echo "Trimmomatic"
    echo -e "\tdata directory: $data/"
    echo -e "\ttarget: $target"
    echo -e "\toutput directory: $trimmed/"
    if $runfqc_before; then
        echo -e "\tFastQC before trimming: $fqc_before_dir"
    fi
    if $runfqc_after; then
        echo -e "\tFastQC after trimming: $fqc_after_dir"
    fi
    echo -e "\tadapter file: $adapter"
    echo -e "\ttrimming command: $trimmer"
    
    #-------------------------------------------------------------------------------
    # check for debug mode (interactive submission)
    #-------------------------------------------------------------------------------
    debug=true
    if [ $SLURM_SUBMIT_HOST ]; then
        debug=false
        echo -e "\n#-------------------------------------------------------------------------------"
        echo -e "# Executing script"
        echo -e "#-------------------------------------------------------------------------------"
    else
        echo -e "\n#-------------------------------------------------------------------------------"
        echo -e "# Debug mode: commands will not be executed"
        echo -e "#-------------------------------------------------------------------------------"
    fi
    
    ################################################################################
    # begin script; hopefully nothing below here needs to be changed
    ################################################################################
    if [ $SLURM_SUBMIT_HOST ]; then
        # execute only if qsub
        echo "working directory: $SLURM_SUBMIT_DIR"
        cd $SLURM_SUBMIT_DIR
    
        echo "loading bioinformatics modules"
        module load bioinfo
        module load fastqc
        module load trimmomatic
    fi
    
    #-------------------------------------------------------------------------------
    # pre-cleaning - fastqc
    #-------------------------------------------------------------------------------
    if $runfqc_before; then
        echo -e "\nFastQC before trimming"
        echo -e "\t`date`"
        com0="fastqc -q -t 20 -o $fqc_before_dir $data/*$beforesuffix"
        if [ -e "$fqc_before_dir" ]; then
            echo -e "\t$fqc_before_dir/ already exists"
            echo -e "\t$com0"
        else
            echo -e "\tCreating directory $fqc_before_dir/"
            echo -e "\t$com0"
            if ! $debug; then
                mkdir $fqc_before_dir
                $com0
            fi
        fi
    else
        echo -e "\nFastQC before trimming - skipped"
    fi
    
    #-------------------------------------------------------------------------------
    # setup output directory $trimmed
    #-------------------------------------------------------------------------------
    echo -e "\nTrimmomatic"
    echo -e "\t`date`"
    
    # create output directory, if needed
    if [ -e "$trimmed" ]; then
        echo -e "\tOutput directory $trimmed/ already exists"
    else
        echo -e "\tCreating output directory $trimmed/"
        if ! $debug; then
            mkdir $trimmed
        fi
    fi
    
    if ! $debug; then
        date >> $trimmed/trimming_parameters.txt
        echo $trimmer >> $trimmed/trimming_parameters.txt
    else
        date
        echo -e "trimming parameters will be written to $trimmed/trimming_parameters.txt\n"
    fi
    #-------------------------------------------------------------------------------
    # Trimmomatic
    # input is all files that match the target: $target
    #-------------------------------------------------------------------------------
    for r1 in $target; do
        # for each read 1, generate the read 2 name be replacing $r1tag with $r2tag
        # generate the paired and unpaired output file names by replacing
        # $fastqsuffix with paired$fastqsuffix or unpaired.$fastqsuffix in both
        # read 1 and read 2 names
    
        r2="${r1/$beforesuffix/$aftersuffix}"
        r2=$trimmed/"${r2/$data\//}"
    
        com1="trimmomatic SE \
        -threads 20 \
        $r1 $r2 \
        $trimmer"
        #com1=${com1//     / }
        com1out=${com1//     /\\n\\t    }
        echo -e "\n\t$com1out"
    
        if ! $debug; then
            $com1
        fi
    
    done
    
    wait
    
    #-------------------------------------------------------------------------------
    # post-cleaning - fastqc
    #-------------------------------------------------------------------------------
    if $runfqc_after; then
        echo -e "\nFastQC after trimming"
        echo -e "\t`date`"
        com2="fastqc -q -t 20 -o $fqc_after_dir --limits fastqc.limits $trimmed/*aftersuffix"
        if [ -e "$fqc_after_dir" ]; then
            echo -e "\t$fqc_after_dir/ already exists"
            echo -e "\t$com2"
        else
            echo -e "\tCreating directory $fqc_after_dir/"
            echo -e "\t$com2"
        fi
        if ! $debug; then
            mkdir $fqc_after_dir
            $com2
        fi
    else
        echo -e "\nFastQC after trimming - skipped"
    fi
    
    echo -e "\n`date`"
    
    
    Automated (PBS) - FastQC/Trimmomatic/FastQC
    #/bin/bash
    #PBS -N trimmomatic
    #PBS -q standby
    #PBS -l nodes=1:ppn=20
    #PBS -l walltime=4:00:00
    #PBS -l epilogue=/home/mgribsko/jobs/epilogue.sh
    #PBS -l naccesspolicy=shared
    
    ################################################################################
    # Trimmomatic
    # Quality trim Illumina paired end reads, and remove adapter sequences.
    # This script generates the names of the R2 files from the names of the R1 files
    # so only the R1 files need to be provided
    #
    # usage
    #    qsub trimmomatic.job             # run on cluster
    #    trimmomatic.job debug            # write out commands for examination
    ################################################################################
    data="."                              # directory where raw data files are found
    r1tag="_R1"                           # string that identifies read 1 files
    r2tag="_R2"                           # string that identifies the read 2 files
    fastqsuffix=".fastq"                  # terminal part of fastq filenames
    target="$data/*$r1tag$fastqsuffix"    # search term for fastq files
    trimmed="trimmed"                     # directory for the trimmed files
    
    adapter="/home/mgribsko/reference/adapter_t.fa"    # adapter file
    
    runfqc_before=true                    # change to false to skip before trimming QC
    runfqc_after=true                     # change to false to skip after trimming QC
    fqc_before_dir="fastqc_before"        # directory for fastqc before
    fqc_after_dir="fastqc_after"          # directory for fastqc after
    
    #-------------------------------------------------------------------------------
    # parameter report
    #-------------------------------------------------------------------------------
    echo "Trimmomatic"
    echo -e "\tdata directory: $data/"
    echo -e "\tR1/R2 tags: $r1tag/$r2tag"
    echo -e "\tfastq file suffix: $fastqsuffix"
    echo -e "\ttarget: $target"
    echo -e "\toutput directory: $trimmed/"
    if $runfqc_before; then
        echo -e "\tFastQC before trimming: $fqc_before_dir"
    fi
    if $runfqc_after; then
        echo -e "\tFastQC after trimming: $fqc_after_dir"
    fi
    echo -e "\tadapter file: $adapter"
    
    # trimming command definition.  Each command is executed in the order given.
    # Adapter trimming should go first, if used
    
    trimmer="ILLUMINACLIP:$adapter:2:12:6 \
    LEADING:10 \
    TRAILING:10 \
    SLIDINGWINDOW:5:16 \
    MINLEN:30"
    
    echo -e "\ttrimming command: $trimmer"
    
    #-------------------------------------------------------------------------------
    # check for debug mode
    #-------------------------------------------------------------------------------
    debug=false
    if [[ $1 == "debug" ]]; then
        debug=true
    fi
    if $debug; then
        echo -e "\n#-------------------------------------------------------------------------------"
        echo -e "# Debug mode: commands will not be executed"
        echo -e "#-------------------------------------------------------------------------------"
    else
        echo -e "\n#-------------------------------------------------------------------------------"
        echo -e "# Executing script"
        echo -e "#-------------------------------------------------------------------------------"
    fi
    
    ################################################################################
    # begin script; hopefully nothing below here needs to be changed
    ################################################################################
    if [ $PBS_O_WORKDIR ]; then
        # execute only if qsub
        cd $PBS_O_WORKDIR
        module load bioinfo
        module load fastqc
        module load trimmomatic
    fi
    
    #-------------------------------------------------------------------------------
    # pre-cleaning - fastqc
    #-------------------------------------------------------------------------------
    if $runfqc_before; then
        echo -e "\nFastQC before trimming"
        echo -e "\t`date`"
        com0="fastqc -q -t 20 -o $fqc_before_dir $target"
        if [ -e "$fqc_before_dir" ]; then
            echo -e "\t$fqc_before_dir/ already exists"
            echo -e "\t$com0"
        else
            echo -e "\tCreating directory $fqc_before_dir/"
            echo -e "\t$com0"
            if ! $debug; then
                mkdir $fqc_before_dir
                $com0
            fi
        fi
    else
        echo -e "\nFastQC before trimming - skipped"
    fi
    
    #-------------------------------------------------------------------------------
    # Trimmomatic
    # input is all files that match the target: $data/*$r1tag$fastqsuffix
    #-------------------------------------------------------------------------------
    echo -e "\nTrimmomatic"
    echo -e "\t`date`"
    
    # create output directory, if needed
    if [ -e "$trimmed" ]; then
        echo -e "\tOutput directory $trimmed/ already exists"
    else
        echo -e "\tCreating output directory $trimmed/"
        if ! $debug; then
            mkdir $trimmed
        fi
    fi
    
    # run trimmomatic on each set of reads
    for r1 in $target; do
        # for each read 1, generate the read 2 name be replacing $r1tag with $r2tag
        # generate the paired and unpaired output file names by replacing
        # $fastqsuffix with paired$fastqsuffix or unpaired.$fastqsuffix in both
        # read 1 and read 2 names
    
        r2="${r1/$r1tag/$r2tag}"
        r1p="${r1/$fastqsuffix/.paired$fastqsuffix}"
        r1u="${r1/$fastqsuffix/.unpaired$fastqsuffix}"
        r2p="${r2/$fastqsuffix/.paired$fastqsuffix}"
        r2u="${r2/$fastqsuffix/.unpaired$fastqsuffix}"
    
        # remove data directory from output files
        r1p=$trimmed/"${r1p/$data\//}"
        r1u=$trimmed/"${r1u/$data\//}"
        r2p=$trimmed/"${r2p/$data\//}"
        r2u=$trimmed/"${r2u/$data\//}"
    
        com1="trimmomatic PE \
        -threads 20 \
        $r1 $r2 \
        $r1p $r1u \
        $r2p $r2u \
        $trimmer"
        #com1=${com1//     / }
        com1out=${com1//     /\\n\\t    }
        echo -e "\n\t$com1out"
    
        if ! $debug; then
            $com1
        fi
    
    done
    
    wait
    
    #-------------------------------------------------------------------------------
    # post-cleaning - fastqc
    #-------------------------------------------------------------------------------
    if $runfqc_after; then
        echo -e "\nFastQC after trimming"
        echo -e "\t`date`"
        com2="fastqc -q -t 20 -o $fqc_after_dir $trimmed/*.paired$fastqsuffix"
        if [ -e "$fqc_after_dir" ]; then
            echo -e "\t$fqc_after_dir/ already exists"
            echo -e "\t$com2"
        else
            echo -e "\tCreating directory $fqc_after_dir/"
            echo -e "\t$com2"
            if ! $debug; then
                mkdir $fqc_after_dir
                $com2
            fi
        fi
    else
        echo -e "\nFastQC after trimming - skipped"
    fi
    
    echo -e "\n`date`"
    
    
    
    Automatic with separate SLURM jobs
    #!/bin/bash
    
    ################################################################################
    # Trimmomatic
    # Quality trim Illumina paired end reads, and remove adapter sequences.
    # This script generates the names of the R2 files from the names of the R1 files
    # so only the R1 files need to be provided
    #
    # individual slurm jobs files are written for each run so that very large sets
    # can be run on the standby queue.
    # script requires one command line argument: the directory where the data files
    # are found.
    #
    # usage
    #   trimmomatic.sh PRJNA551035 > trimmomatic.log
    ################################################################################
    debug=false                                       # set to true to produce job file without submitting
    
    data="../../data/$1"                  # directory where raw data files are found
    r1tag="_1"                            # string that identifies read 1 files
    r2tag="_2"                            # string that identifies the read 2 files
    fastqsuffix=".fastq"                  # terminal part of fastq filenames
    target="$data/SRR*$r1tag$fastqsuffix" # search term for fastq files
    trimmed="trimmed"                     # directory for the trimmed files
    
    runfqc_before=false                   # change to false to skip before trimming QC
    runfqc_after=true                     # change to false to skip after trimming QC
    fqc_before_dir="before"               # directory for fastqc before
    fqc_after_dir="after"                 # directory for fastqc after
    
    # trimming command definition.  Each command is executed in the order given.
    # Adapter trimming should go first, if used
    
    #adapter=/group/bioinfo/apps/apps/trimmomatic-0.39/adapters/TruSeq2-PE.fa        # adapter file
    adapter=../adapter_short.fa           # adapter file
    
    trimmer="ILLUMINACLIP:$adapter:2:12:6 \
    LEADING:13 \
    TRAILING:13 \
    SLIDINGWINDOW:5:16 \
    MINLEN:30"
    
    header="#SBATCH --output=%x_%j.out
    #SBATCH --error=%x_%j.err
    #SBATCH --account standby
    #SBATCH --nodes=1
    #SBATCH --ntasks=20
    #SBATCH --time=4:00:00"
    
    #-------------------------------------------------------------------------------
    # parameter report
    #-------------------------------------------------------------------------------
    echo "Trimmomatic"
    echo -e "\tdata directory: $data/"
    echo -e "\tR1/R2 tags: $r1tag/$r2tag"
    echo -e "\tfastq file suffix: $fastqsuffix"
    echo -e "\ttarget: $target"
    echo -e "\toutput directory: $trimmed/"
    if $runfqc_before; then
        echo -e "\tFastQC before trimming: $fqc_before_dir"
    fi
    if $runfqc_after; then
        echo -e "\tFastQC after trimming: $fqc_after_dir"
    fi
    echo -e "\tadapter file: $adapter"
    echo -e "\ttrimming command: $trimmer"
    
    #-------------------------------------------------------------------------------
    # create output, fastqc befor, and fastqv after directories, if needed
    #-------------------------------------------------------------------------------
    if [ -e "$trimmed" ]; then
        echo -e "\tOutput directory $trimmed/ already exists"
    else
        echo -e "\tCreating output directory $trimmed/"
        if ! $debug; then
            mkdir $trimmed
        fi
    fi
    
    if [ -e "$fqc_before_dir" ]; then
        echo -e "\t$fqc_before_dir/ already exists"
    else
        echo -e "\nCreating directory $fqc_before_dir/"
        mkdir $fqc_before_dir
    fi
    
    if [ -e "$fqc_after_dir" ]; then
        echo -e "\t$fqc_after_dir/ already exists"
    else
        echo -e "\nCreating directory $fqc_after_dir/"
        mkdir $fqc_after_dir
    fi
    
    ################################################################################
    # begin script; hopefully nothing below here needs to be changed
    #
    # run trimmomatic on each set of reads by writing and submitting a slurm job
    ################################################################################
    
    echo -e "\nWriting and submitting slurm jobs"
    for r1 in $target; do
        # name for slurm submission file
        base=${r1%%_*}
        base=${base##*/}
        slurmjob=$base".trimmomatic.slurm"
    
        # slurm header
        echo "#!/bin/bash" > $slurmjob
        echo "#SBATCH --job-name trim_$base" >> $slurmjob
        echo -e "$header\n" >> $slurmjob
        echo -e "\t$base => slurm job: $slurmjob jobid: trim_$base"
    
    
        # load bioinformatics modules
        modules="bioinfo trimmomatic fastqc"
        for m in $modules; do
            echo "module load $m" >> $slurmjob
        done
    
        #-------------------------------------------------------------------------------
        # pre-cleaning - fastqc
        #-------------------------------------------------------------------------------
        if $runfqc_before; then
            echo -e "\n# FastQC before trimming", `date` >> $slurmjob
            com2="fastqc -q -t 20 --limits ../limits.txt -o $fqc_before_dir $trimmed/*.paired$fastqsuffix"
            echo -e "$com2\n\n" >>$slurmjob
        fi
    
        #-------------------------------------------------------------------------------
        # Trimmomatic
        # input is all files that match the target: $data/*$r1tag$fastqsuffix
        #-------------------------------------------------------------------------------
    
        # for each read 1, generate the read 2 name be replacing $r1tag with $r2tag
        # generate the paired and unpaired output file names by replacing
        # $fastqsuffix with paired$fastqsuffix or unpaired.$fastqsuffix in both
        # read 1 and read 2 names
    
        r2="${r1/$r1tag/$r2tag}"
        r1p="${r1/$fastqsuffix/.paired$fastqsuffix}"
        r1u="${r1/$fastqsuffix/.unpaired$fastqsuffix}"
        r2p="${r2/$fastqsuffix/.paired$fastqsuffix}"
        r2u="${r2/$fastqsuffix/.unpaired$fastqsuffix}"
    
        # remove data directory from output files
        r1p=$trimmed/"${r1p/$data\//}"
        r1u=$trimmed/"${r1u/$data\//}"
        r2p=$trimmed/"${r2p/$data\//}"
        r2u=$trimmed/"${r2u/$data\//}"
    
        com1="trimmomatic PE \\
        -threads 20 \\
        $r1 $r2 \\
        $r1p $r1u \\
        $r2p $r2u \\
        $trimmer"
        #com1=${com1//     / }
        com1out=${com1//    /\ }
        echo -e "\n# Run trimmomatic on $base", `date`  >> $slurmjob
        echo "$com1out" >> $slurmjob
    
        #-------------------------------------------------------------------------------
        # post-cleaning - fastqc
        #-------------------------------------------------------------------------------
        if $runfqc_after; then
            echo -e "\n# FastQC after trimming", `date` >> $slurmjob
            com2="fastqc -q -t 20 --limits ../limits.txt -o $fqc_after_dir $trimmed/*.paired$fastqsuffix"
            echo -e "$com2\n\n" >>$slurmjob
        fi
    
        #-------------------------------------------------------------------------------
        # submit jobfile with sbatch
        #-------------------------------------------------------------------------------
    
        if ! $debug; then
            sbatch $slurmjob
        fi
    
    done
    
    


Related articles