IMP3 output: Binning

The IMP3 Binning step will output the results of all binners the user selected, the results from DASTool and the results of running GRiD on the bins that DASTool selected in the Binning directory within the outputdir (see configuration).

Binning tools

Each binninng tool has a separate output directory within the Binning directory. Currently, IMP3 implements MetaBAT2, MaxBin2 and Binny. Each binning tools generates a final file linking contigs to bins that is always named scaffold2bin.tsv.

MetaBAT is the output directory of MetaBAT2. MetaBAT2 outputs a tab-separated file with contig IDs and bin numbers metabat_res, which is linked to in scaffold2bin.tsv. In addition, the contigs in each bin, except bin 0, are given in one Fasta file metabat_res.<bin>.fa.

MaxBin is the output directory of MaxBin2. MaxBin2 <https://kbase.us/applist/apps/kb_maxbin/run_maxbin2/release>`_outputs a tab-separated file with contig IDs and bin names ``maxbin_contig2bin.txt`, which is linked to in scaffold2bin.tsv. MaxBin2 also outputs a Fasta file for each bin maxbin_res.<bin>.fasta. Contigs which can’t be put in a bin are in maxbin_res.noclass and contigs that are too short for analysis are in maxbin_res.tooshort. Summaries of the essential markers MaxBin detected in each bin and the bin size are in maxbin_res.marker and maxbin_res.summary. In addition, intermediary results are in maxbin_res.marker_of_each_bin.tar.gz and a log is recorded to maxbin_res.log.

binny is the output directory of Binny. Binny outputs its final result contigs2clusters.10.4.tsv and contigs2clusters.10.4.RDS. The bins with at least some completeness (names starting with P (“perfect”), G (“good”), O (“okay”), L “low completeness”) are extracted and copied to scaffold2bin.tsv. Binny is based on tSNE embeddings by VizBin of k-mer frequencies in contigs with masked (temporarily removed) rRNA genes (<mg|mt|mgmt>.assembly.merged.cut.fa) and the coordinates from VizBin are stored in <mg|mt|mgmt>.vizbin.with-contig-names.points. For comprehensibility and transparency, intermediate results are kept in clusterFiles.tar.gz, clusterFirstScan.<pk>.<nn>.tsv, bimodalClusterCutoffs.<pk>.<nn>.tsv, reachabilityDistanceEstimates.<pk>.<nn>.tsv, clusteringWS.<pk>.<nn>.Rdata, and binny_WS.Rdata. Binny also visualizes its intermediary results in scatterPlot<1-4>.<pk>.<nn>.pdf and the final bins visualized in a tSNE embedding in finalClusterMap.10.4.png.

DASTool

The summarized output of DASTool is in selected_DASTool_summary.txt and the selected_DASTool_bins directory. The contigs of each bin are in selected_DASTool_bins/<bin>.contigs.fa. DASTool uses the presence of single-copy marker genes to assess bins. As IMP3 has already generated gene predictions in the Analysis step, they are used as input for DASTool (with a changed header; prokka.renamed.faa) and DASTool keeps the annotations per gene in prokka.renamed.faa.<archaea|bacteria>.scg. The lengths of all contigs are in selected.seqlength. The results of DASTool assessment are in selected_<binner>.eval. DASTool also gives visual output in selected_DASTool_hqBins.pdf and selected_DASTool_scores.pdf and a log file in selected_DASTool.log.

GRiD

The results of GRiD is given for every DASTool selected bin in selected_DASTool_bins/<bin>/grid. IMP3 archives and compresses these directories after adding the information into Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv. All the archived data is in per_bin_results.tar.gz.

GTDBtk

If the Taxonomy step is selected in addition to the Binning step, the selected bins from DASTool are analysed with GTDBtk. The results are in selected_DASTool_bins/<bin>/GTDB. IMP3 archives this data in per_bin_results.tar.gz, after the results are added to Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv.

Stats from the Binning step

The Binning step writes to the Stats directory. If no Taxonomy step is run, the GRiD results are summarized together with the rest of the binning results in Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv. If taxonomy is performed, the GRiD results are also combined with the GTDBtk results in Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv. Some more results are summarized by the Summary step if defined.