IMP3 output: Binning¶
The IMP3 Binning step will output the results of all binners the user selected, the results from
DASTool and the results of running GRiD on the bins that DASTool
selected in the Binning
directory within the outputdir
(see configuration).
Binning tools¶
Each binninng tool has a separate output directory within the Binning
directory. Currently, IMP3 implements MetaBAT2,
MaxBin2 and Binny. Each binning tools generates a final file
linking contigs to bins that is always named scaffold2bin.tsv
.
MetaBAT
is the output directory of MetaBAT2. MetaBAT2
outputs a tab-separated file with contig IDs and bin numbers metabat_res
, which is linked to in scaffold2bin.tsv
. In addition, the contigs in each bin, except bin 0, are given in one Fasta file metabat_res.<bin>.fa
.
MaxBin
is the output directory of MaxBin2. MaxBin2 <https://kbase.us/applist/apps/kb_maxbin/run_maxbin2/release>`_outputs a tab-separated file with contig IDs and bin names ``maxbin_contig2bin.txt`, which is linked
to in scaffold2bin.tsv
. MaxBin2 also outputs a Fasta file for each bin maxbin_res.<bin>.fasta
. Contigs which can’t be put in
a bin are in maxbin_res.noclass
and contigs that are too short for analysis are in maxbin_res.tooshort
. Summaries of the essential markers MaxBin detected in each bin and the bin size are in maxbin_res.marker
and maxbin_res.summary
.
In addition, intermediary results are in maxbin_res.marker_of_each_bin.tar.gz
and a log is recorded to maxbin_res.log
.
binny
is the output directory of Binny. Binny outputs its final result contigs2clusters.10.4.tsv
and contigs2clusters.10.4.RDS
. The bins with at least some
completeness (names starting with P (“perfect”), G (“good”), O (“okay”), L “low completeness”) are extracted and copied to scaffold2bin.tsv
.
Binny is based on tSNE embeddings by VizBin of k-mer frequencies in contigs with masked
(temporarily removed) rRNA genes (<mg|mt|mgmt>.assembly.merged.cut.fa
) and the coordinates from VizBin are stored in <mg|mt|mgmt>.vizbin.with-contig-names.points
.
For comprehensibility and transparency, intermediate results are kept in clusterFiles.tar.gz
, clusterFirstScan.<pk>.<nn>.tsv
, bimodalClusterCutoffs.<pk>.<nn>.tsv
,
reachabilityDistanceEstimates.<pk>.<nn>.tsv
, clusteringWS.<pk>.<nn>.Rdata
, and binny_WS.Rdata
. Binny also visualizes
its intermediary results in scatterPlot<1-4>.<pk>.<nn>.pdf
and the final bins visualized in a tSNE embedding in finalClusterMap.10.4.png
.
DASTool¶
The summarized output of DASTool is in selected_DASTool_summary.txt
and the selected_DASTool_bins
directory.
The contigs of each bin are in selected_DASTool_bins/<bin>.contigs.fa
. DASTool uses the presence of
single-copy marker genes to assess bins. As IMP3 has already generated gene predictions in the Analysis step, they are used as input
for DASTool (with a changed header; prokka.renamed.faa
) and DASTool
keeps the annotations per gene in prokka.renamed.faa.<archaea|bacteria>.scg
. The lengths of all contigs are in selected.seqlength
.
The results of DASTool assessment are in selected_<binner>.eval
. DASTool also gives visual
output in selected_DASTool_hqBins.pdf
and selected_DASTool_scores.pdf
and a log file in selected_DASTool.log
.
GRiD¶
The results of GRiD is given for every DASTool
selected bin in selected_DASTool_bins/<bin>/grid
. IMP3 archives and compresses these directories after
adding the information into Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv
. All the archived data is in per_bin_results.tar.gz
.
GTDBtk¶
If the Taxonomy step is selected in addition to the Binning step, the selected bins from
DASTool are analysed with GTDBtk. The results are
in selected_DASTool_bins/<bin>/GTDB
. IMP3 archives this data in per_bin_results.tar.gz
, after the results are added to
Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv
.
Stats from the Binning step¶
The Binning step writes to the Stats
directory. If no Taxonomy step is run, the
GRiD results are summarized together with the rest of the binning results in Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv
.
If taxonomy is performed, the GRiD results are also combined with the GTDBtk
results in Stats/<mg|mt|mgmt>/<mg|mt|mgmt>.bins.tsv
. Some more results are summarized by the Summary step if defined.