IMP3 steps: Analysis

In the Analysis step, IMP3 calls open reading frames, rRNA and tRNA genes, and annotate CRISPR repeats. Open reading frames are functionally annotated. The number of reads mapping to each gene and functional group of genes are calculated. The steps are described in more detail below.

Customized prokka

IMP3 uses prokka to call open reading frames (ORF), rRNA and tRNA genes, and annotate CRISPR repeats. Internally, prokka calls prodigal for the ORF calling, barrnap for the rRNA regions, ARAGORN for tRNA loci and MinCED to detect CRISPR arrays.

Prokka forces prodigal to only call complete genes. Due to the fragmented nature of metagenomic contigs, it is preferable to also allow partial genes. IMP3’s customized prokka allows prodigal to call incomplete ORFs and records whether prodigal detected start and stop codons. One side-effect of this is that the amino acid sequences prokka returns prokka.faa are badly formatted. The prokka amino acid sequences also don’t start with M if prodigal called genes with an alternative start codon. Both issues are corrected in another IMP3 step as part of the metaproteomics preparations (proteomics.proteins.faa).

Prokka would usually provide some functional analyses by aligning the called ORFs to some databases. However, this analysis is optimized for speed, meaning that genes that have been annotated with one database are not annotated with the next. This leads to genes potentially having inconsistent annotations, and it would be impossible for the user to find out what would have been the best hit in another database. Since IMP3 does functional annotations of all genes with any database the user chooses to reach consistent annotations, we’ve disabled the prokka-based annotation.

Prokka also spends considerable time to convert its output into genbank format. As IMP3 has no need for genbank-formatted data, we’ve disabled this.

Annotation with HMMs

Variants for metaP searches