Abstract
Metagenomic Hi-C (metaHi-C) enables the recognition of relationships between contigs in terms of their physical proximity within the same cell, facilitating the reconstruction of high-quality metagenome-assembled genomes (MAGs) from complex microbial communities. However, current Hi-C-based contig binning methods solely depend on Hi-C interactions between contigs to group them, ignoring invaluable biological information, including the presence of single-copy marker genes. Here, we introduce ImputeCC, an integrative contig binning tool tailored for metaHi-C datasets. ImputeCC integrates Hi-C interactions with the inherent discriminative power of single-copy marker genes, initially clustering them as preliminary bins, and develops a new constrained random walk with restart (CRWR) algorithm to improve Hi-C connectivity among these contigs. Extensive evaluations on mock and real metaHi-C datasets from diverse environments, including the human gut, wastewater, cow rumen, and sheep gut, demonstrate that ImputeCC consistently outperforms other Hi-C-based contig binning tools. ImputeCC’s genus-level analysis of the sheep gut microbiota further reveals its ability and potential to recover essential species from dominant genera such as Bacteroides, detect previously unrecognized genera, and shed light on the characteristics and functional roles of genera such as Alistipes within the sheep gut ecosystem.
Availability: ImputeCC is implemented in Python and available at https://github.com/dyxstat/ImputeCC. The Supplementary Information is available at https://doi.org/10.5281/zenodo.10776604.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., Nielsen, P.H.: Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31(6), 533–538 (2013)
Baudry, L., Foutel-Rodier, T., Thierry, A., Koszul, R., Marbouty, M.: MetaTOR: a computational pipeline to recover high-quality metagenomic bins from mammalian gut proximity-ligation (me) libraries. Front. Genet. 10, 753 (2019)
Bickhart, D.M., Kolmogorov, M., Tseng, E., Portik, D.M., Korobeynikov, A., Tolstoganov, I., Uritskiy, G., Liachko, I., Sullivan, S.T., Shin, S.B., et al.: Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat. Biotechnol. 40(5), 711–719 (2022)
Bickhart, D.M., Watson, M., Koren, S., Panke-Buisse, K., Cersosimo, L.M., Press, M.O., Van Tassell, C.P., Van Kessel, J.A.S., Haley, B.J., Kim, S.W., et al.: Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biol. 20, 153 (2019)
Burton, J.N., Liachko, I., Dunham, M.J., Shendure, J.: Species-level deconvolution of metagenome assemblies with Hi-C–based contact probability maps. G3 (Bethesda) 4(7), 1339–1346 (2014)
Bushnell, B.: BBMap: a fast, accurate, splice-aware aligner. Tech. rep., Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States) (2014)
Chaumeil, P.A., Mussig, A.J., Hugenholtz, P., Parks, D.H.: GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38(23), 5315–5316 (2022)
Chen, Y., Wang, Y., Paez-Espino, D., Polz, M.F., Zhang, T.: Prokaryotic viruses impact functional microorganisms in nutrient removal and carbon cycle in wastewater treatment plants. Nat. Commun. 12, 5398 (2021)
Chklovski, A., Parks, D.H., Woodcroft, B.J., Tyson, G.W.: CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods 20, 1203–1212 (2023)
DeMaere, M.Z., Darling, A.E.: Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies. GigaScience 7(2), gix103 (2018)
DeMaere, M.Z., Darling, A.E.: bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biol. 20, 46 (2019)
Du, Y., Fuhrman, J.A., Sun, F.: ViralCC retrieves complete viral genomes and virus-host pairs from metagenomic Hi-C data. Nat. Commun. 14, 502 (2023)
Du, Y., Laperriere, S.M., Fuhrman, J., Sun, F.: Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression. J. Comput. Biol. 29, 106–120 (2022)
Du, Y., Sun, F.: HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol. 23, 63 (2022)
Du, Y., Sun, F.: MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data. Nat. Commun. 14, 6231 (2023)
Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucl Acids Res 39(suppl_2), W29–W37 (2011)
Handelsman, J.: Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 68(4), 669–685 (2004)
Hugenholtz, P., Tyson, G.W.: Metagenomics. Nature 455(7212), 481–483 (2008)
Hugerth, L.W., Larsson, J., Alneberg, J., Lindh, M.V., Legrand, C., Pinhassi, J., Andersson, A.F.: Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 16, 279 (2015)
Karp, R.M.: An algorithm to solve the m\(\times \) n assignment problem in expected time O (mn log n). Networks 10(2), 143–152 (1980)
Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ar**v (2013). 10.48550/ar**v.1303.3997
Marbouty, M., Cournac, A., Flot, J.F., Marie-Nelly, H., Mozziconacci, J., Koszul, R.: Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. eLife 3, e03318 (2014)
Meslier, V., Quinquis, B., Da Silva, K., Plaza Oñate, F., Pons, N., Roume, H., Podar, M., Almeida, M.: Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Sci Data 9(1), 694 (2022)
Nissen, J.N., Johansen, J., Allesøe, R.L., Sønderby, C.K., Armenteros, J.J.A., Grønbech, C.H., Jensen, L.J., Nielsen, H.B., Petersen, T.N., Winther, O., et al.: Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 39, 555–560 (2021)
Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S., Phillippy, A.M.: Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016)
Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P., Tyson, G.W.: CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25(7), 1043–1055 (2015)
Press, M.O., Wiser, A.H., Kronenberg, Z.N., Langford, K.W., Shakya, M., Lo, C.C., Mueller, K.A., Sullivan, S.T., Chain, P.S., Liachko, I.: Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions. bioRxiv (2017). 10.1101/198713
Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al.: A 3D map of the human genome at kilobase resolution reveals principles of chromatin loo**. Cell 159(7), 1665–1680 (2014)
Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74(1), 016110 (2006)
Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucl Acids Res 38(20), e191–e191 (2010)
Routy, B., Gopalakrishnan, V., Daillère, R., Zitvogel, L., Wargo, J.A., Kroemer, G.: The gut microbiota influences anticancer immunosurveillance and general health. Nat. Rev. Clin. Oncol. 15, 382–396 (2018)
Stalder, T., Press, M.O., Sullivan, S., Liachko, I., Top, E.M.: Linking the resistome and plasmidome to the microbiome. ISME J. 13(10), 2437–2446 (2019)
Traag, V.A., Waltman, L., Van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019)
Wu, Y.W., Tang, Y.H., Tringe, S.G., Simmons, B.A., Singer, S.W.: MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2(26) (2014)
Yaffe, E., Relman, D.A.: Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat. Microbiol. 5(2), 343–353 (2020)
Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan, I., Dominguez-Bello, M.G., Contreras, M., Magris, M., Hidalgo, G., Baldassano, R.N., Anokhin, A.P., et al.: Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012)
Acknowledgments
Y.D. and F.S. conceived the ideas and designed the study. Y.D. implemented the methods, carried out the computational analyses, and drafted the manuscript. Y.D. and W.Z. developed the software. All authors modified and finalized the paper. The research is partially funded by NSF grant EF-2125142. The authors declare no competing interests.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Du, Y., Zuo, W., Sun, F. (2024). ImputeCC Enhances Integrative Hi-C-Based Metagenomic Binning Through Constrained Random-Walk-Based Imputation. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_7
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3989-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-1-0716-3988-7
Online ISBN: 978-1-0716-3989-4
eBook Packages: Computer ScienceComputer Science (R0)