Comparing genomes recovered from time-series metagenomes using long- and short-read sequencing technologies

Orellana, Luis H.; Krüger, Karen; Sidhu, Chandni; Amann, Rudolf

doi:10.1186/s40168-023-01557-3

Comparing genomes recovered from time-series metagenomes using long- and short-read sequencing technologies

Brief Report
Open access
Published: 13 May 2023

Volume 11, article number 105, (2023)
Cite this article

Download PDF

You have full access to this open access article

Microbiome Aims and scope Submit manuscript

Comparing genomes recovered from time-series metagenomes using long- and short-read sequencing technologies

Download PDF

Luis H. Orellana¹,
Karen Krüger¹,
Chandni Sidhu¹ &
…
Rudolf Amann¹

5234 Accesses
9 Citations
29 Altmetric
Explore all metrics

Abstract

Background

Over the past years, sequencing technologies have expanded our ability to examine novel microbial metabolisms and diversity previously obscured by isolation approaches. Long-read sequencing promises to revolutionize the metagenomic field and recover less fragmented genomes from environmental samples. Nonetheless, how to best benefit from long-read sequencing and whether long-read sequencing can provide recovered genomes of similar characteristics as short-read approaches remains unclear.

Results

We recovered metagenome-assembled genomes (MAGs) from the free-living fraction at four-time points during a spring bloom in the North Sea. The taxonomic composition of all MAGs recovered was comparable between technologies. However, differences consisted of higher sequencing depth for contigs and higher genome population diversity in short-read compared to long-read metagenomes. When pairing population genomes recovered from both sequencing approaches that shared ≥ 99% average nucleotide identity, long-read MAGs were composed of fewer contigs, a higher N50, and a higher number of predicted genes when compared to short-read MAGs. Moreover, 88% of the total long-read MAGs carried a 16S rRNA gene compared to only 23% of MAGs recovered from short-read metagenomes. Relative abundances for population genomes recovered using both technologies were similar, although disagreements were observed for high and low GC content MAGs.

Conclusions

Our results highlight that short-read technologies recovered more MAGs and a higher number of species than long-read due to an overall higher sequencing depth. Long-read samples produced higher quality MAGs and similar species composition compared to short-read sequencing. Differences in the GC content recovered by each sequencing technology resulted in divergences in the diversity recovered and relative abundance of MAGs within the GC content boundaries.

MinION sequencing from sea ice cryoconites leads to de novo genome reconstruction from metagenomes

Article Open access 26 October 2021

Genomes from uncultivated prokaryotes: a comparison of metagenome-assembled and single-amplified genomes

Article Open access 28 September 2018

500 metagenome-assembled microbial genomes from 30 subtropical estuaries in South China

Article Open access 16 June 2022

Background

In shotgun metagenomic approaches, limitations in the read length (i.e., ~ 100–250 bp sequences) often translate into fragmented reconstructed metagenome-assembled genomes (MAGs) with uncertain levels of genomic completion during de novo assembly. These complications are primarily due to highly repetitive regions, high levels of sequence microdiversity, multiple copies of genes, and AT-rich/GC-rich regions [1]. Overcoming these limitations is paramount to understanding the role of microorganisms in natural processes and analyzing their diversity in environmental and gut microbiomes.

The emergence of long-read sequencing technologies restores the hopes of overcoming these limitations in genomic sequence recovery. Sequencing platforms from Oxford Nanopore and Pacific Biosciences (PacBio) can produce longer reads, although at the expense of a higher sequencing error rate and less sequencing depth compared to Illumina short-reads [2] (SR). For instance, the median length of reads ranges from 5 to 20 kbp and throughputs from 15 to 50 Gbp in LR technologies. Current sequencing chemistries yield observed modal read accuracies of 99.99%, 99.14%, and 99.9% for Illumina, Oxford Nanopore, and PacBio, respectively [3, 4]. Moreover, advances in technological and bioinformatic approaches are closing the gaps between short- and long-read sequencing technology applications, especially for recovering high-quality MAGs from the environment. Thus, long-read (LR) shotgun metagenomics is poised to set new standards for MAG quality. For instance, current PacBio Sequel II technology offers circular consensus sequencing (CCS), providing a low-error rate in high fidelity reads, although at a shorter read length than the traditional long-read technology [3]. Additionally, better genome statistics (low number of contigs and high N50 values) [https://github.com/PacificBiosciences/pbmm2/; --preset HIFI -× 97 -N 1), which is a wrapper for minimap2 [28]. All MAGs were filtered based on a quality metric based on completion and contamination values obtained from checkM [29] v1.1.3 ([completion%] - 5*[contamination%] > = 50). The de-replication of MAGs was done to assess the number of MAGs sharing > 99% ANI obtained from each sequencing platform using dRep [30] v3.0.0. For statistical tests between pairs of MAGs, the Shapiro-Wilk normality test and Wilcoxon rank tests were performed in the R statistical software v4.1.1.

For Illumina metagenomes, MAG abundances were determined as relative abundance (mapped reads/total reads) and as the quotient between the truncated average sequencing depth (TAD) [31] and the total sequencing depth of microbial genomes “genome equivalents” as determined in MicrobeCensus [32] v1.1.1. The truncated average sequencing depth was determined using BedGraph files considering zero-coverage positions (bedtools genomecov -bga) [33] and the “BedGraph.tad.rb” script (-r 0.8) from the enveomics collection [18]. Abundances for MAGs derived from LR were determined using the average sequencing depth (i.e., non-truncated) as specified above and normalized using the median sequencing depth of 16 single-copy gene markers predicted in unassembled long-reads (rpl2, rpl3, rpl4, rpl5, rpl6, rpl14, rpl15, rpl16, rpl18, rpl22, rpl24, rps3, rps8, rps10, rps17, and rps19; see gene prediction and annotation below).

MAGs defined as “shared” or detectable using both technologies were defined as those MAGs sharing > = 99% ANI, as determined in fastANI [34] v1.32, obtained from each technology at one specific sampling date. Taxonomic classification of MAGs was performed using GTDB-tk [35] v1.7 and the GTDB [36] release r202. In GTDB-tk, MAGs are classified into species using a 95% ANI threshold.

Comparison of gene predictions in unassembled and assembled long-read metagenomes

Gene predictions in unassembled LR were performed using FragGeneScan [37] v1.31. However, we compared different tools to ensure better gene predictions. First, for the March 10, 2020, LR sample gene predictions were performed using Prodigal [38] v2.6.3 (meta option), MetaGeneMark [39] v3.38, and FragGeneScan [37] v1.31. For the last algorithm, we compared predictions using complete/short sequences (-w 0 or 1) and different sequencing error models (sanger_5 and sanger_10). All predicted sequences were compared against the TrEMBL protein sequence database (downloaded April 27, 2021) using DIAMOND [7). For the most part, the cross-map** of SR and LR on unique MAG species resulted in low sequencing depth (median = 6.4 vs. 2.8) and breadth of the coverage (median = 96 vs. 88.3%) for SR and LR technologies. Thus, uniquely detected species in each dataset are likely due to a combined effect of differences in GC content [46] and sequencing depth between technologies.

Other considerations when choosing LR technologies

Currently, PacBio LR shotgun metagenomics is of higher cost per Gbp than SR (~ 2.4 times higher for our project, a further breakdown of costs is available in Table S1). The cost per Gbp of Nanopore is currently between Illumina and PacBio. Nanopore technologies offer the affordability and benefits of recovering longer reads or the possibility of including short technologies for the better recovery of high-quality MAGs [4, 10, 50]. Nonetheless, sequencing error and read lengths should be considered when selecting between LR technologies [4]. Despite the cost differences, the results presented here can guide researchers in deciding if LR metagenomics would be beneficial over SR approaches.

The current stage of algorithms and approaches for LR metagenomics is still limited compared to the large toolbox of SR technologies. While the methodology used here reflects the most appropriate tools and algorithms available at the time, we recommend that future studies pursue a critical assessment of newer approaches [12] when using LR techniques. The dataset presented here can also serve as a reference for testing and comparing algorithms and approaches for shotgun LR metagenomic sequencing.

Conclusions

Our results highlight that switching from SR to LR metagenomic sequencing for microbial community analyses would still capture similar taxonomic composition from population genomes but recover higher-quality MAGs. Nonetheless, SR technologies offered more sequenced bases (e.g., three more times base pairs on average) than LR sequencing on single runs. This higher sequencing effort also translated into a higher number of dereplicated MAGs compared to LR metagenomic samples (i.e., a higher diversity of population genomes). This observation is relevant when the goal is to recover low-abundant organisms. Our work indicates a strongly decreased genome fragmentation and increased recovery of 16S rRNA genes in LR MAGs. These two features translate into better preservation of the order of genes in unassembled LR or LR-derived contigs. For instance, the generation of 16S rRNA probes for fluorescence in situ hybridization for single-cell identification and quantification. Even though a high fraction of overlap** reads was detected between technologies, differences in GC content likely resulted in slight differences in the recovery and abundance of some population genomes.

Availability of data and materials

All sequence data was deposited under the project PRJEB52999 in ENA and https://gitlab.mpi-bremen.de/lorellan/ilmn-vs-pacb-helgoland.

References

Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–51.
Article CAS PubMed Google Scholar
Pollard MO, Gurdasani D, Mentzer AJ, Porter T, Sandhu MS. Long reads: their purpose and place. Hum Mol Genet. 2018;27:R234–41.
Article CAS PubMed PubMed Central Google Scholar
Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62.
Article CAS PubMed PubMed Central Google Scholar
Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods. 2022;19:823–6.
Article CAS PubMed PubMed Central Google Scholar
**e H, Yang C, Sun Y, Igarashi Y, ** T, Luo F. PacBio long reads improve metagenomic assemblies, gene catalogs, and genome binning. Front Genet. 2020;11:516269.
Article CAS PubMed PubMed Central Google Scholar
Haro-Moreno JM, López-Pérez M, Rodriguez-Valera F. Enhanced recovery of microbial genes and genomes from a marine water column using long-read metagenomics. Front Microbiol. 2021;12:708782.
Article PubMed PubMed Central Google Scholar
Priest T, Orellana LH, Huettel B, Fuchs BM, Amann R. Microbial metagenome-assembled genomes of the Fram Strait from short and long read sequencing platforms. PeerJ. 2021;9:e11721.
Article PubMed PubMed Central Google Scholar
Meslier V, Quinquis B, Da Silva K, Plaza Oñate F, Pons N, Roume H, et al. Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Sci Data. 2022;9:694.
Article CAS PubMed PubMed Central Google Scholar
Patin NV, Goodwin KD. Long-Read Sequencing Improves Recovery of Picoeukaryotic Genomes and Zooplankton Marker Genes from Marine Metagenomes. mSystems. 2022;7:e00595-22.
Overholt WA, Hölzer M, Geesink P, Diezel C, Marz M, Küsel K. Inclusion of Oxford Nanopore long reads improves all microbial and viral metagenome-assembled genomes from a complex aquifer system. Environ Microbiol. 2020;22:4000–13.
Article CAS PubMed Google Scholar
Singleton CM, Petriglieri F, Kristensen JM, Kirkegaard RH, Michaelsen TY, Andersen MH, et al. Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing. Nat Commun. 2021;12:2009.
Article CAS PubMed PubMed Central Google Scholar
Gehrig JL, Portik DM, Driscoll MD, Jackson E, Chakraborty S, Gratalo D, et al. Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microb Genomics. 2022;8:000794.
Article Google Scholar
Teeling H, Fuchs BM, Bennke CM, Krüger K, Chafee M, Kappelmann L, et al. Recurring patterns in bacterioplankton dynamics during coastal spring algae blooms. eLife. 2016;5:e11888.
Article PubMed PubMed Central Google Scholar
Sidhu C, Kirstein IV, Meunier CL, Rick J, Fofonova V, Wiltshire KH, et al. Dissolved storage glycans shaped the community composition of abundant bacterioplankton clades during a North Sea spring phytoplankton bloom. Microbiome. 2023;11:77.
Article CAS PubMed PubMed Central Google Scholar
De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Berger B, editor. Bioinformatics. 2018;34:2666–9.
Article PubMed PubMed Central Google Scholar
Rodriguez-R LM, Gunturu S, Tiedje JM, Cole JR, Konstantinidis KT. Nonpareil 3: fast estimation of metagenomic coverage and sequence diversity. mSystems. 2018;3(3):e00039-18.
Article PubMed PubMed Central Google Scholar
Francis TB, Bartosik D, Sura T, Sichert A, Hehemann J-H, Markert S, et al. Changing expression patterns of TonB-dependent transporters suggest shifts in polysaccharide consumption over the course of a spring phytoplankton bloom. ISME J. 2021;15:2336–50.
Article CAS PubMed PubMed Central Google Scholar
Rodriguez-R LM, Konstantinidis KT. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ. 2016;4:e1900v1.
Google Scholar
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.
Article CAS PubMed PubMed Central Google Scholar
Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–6.
Article CAS PubMed Google Scholar
Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–7.
Article CAS PubMed Google Scholar
Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359.
Article PubMed PubMed Central Google Scholar
Graham ED, Heidelberg JF, Tully BJ. BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation. PeerJ. 2017;5:e3035.
Article PubMed PubMed Central Google Scholar
Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 2018;3:836–43.
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37:540–6.
Article CAS PubMed Google Scholar
Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ. 2015;3:e1319.
Article PubMed PubMed Central Google Scholar
Antipov D, Korobeynikov A, McLean JS, Pevzner PA. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics. 2016;32:1009–15.
Article CAS PubMed Google Scholar
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.
Article CAS PubMed PubMed Central Google Scholar
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
Article CAS PubMed PubMed Central Google Scholar
Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11:2864–8.
Article CAS PubMed PubMed Central Google Scholar
Orellana LH, Francis TB, Ferraro M, Hehemann J-H, Fuchs BM, Amann RI. Verrucomicrobiota are specialist consumers of sulfated methyl pentoses during diatom blooms. ISME J. 2022;16(3):630–41.
Article CAS PubMed Google Scholar
Nayfach S, Pollard KS. Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome. Genome Biol. 2015;16:51.
Article PubMed PubMed Central Google Scholar
Quinlan AR. BEDTools: the Swiss-Army Tool for genome feature analysis. Curr Protoc Bioinformatics. 2014;47:11.12.1-34.
Article PubMed Google Scholar
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9:5114.
Article PubMed PubMed Central Google Scholar
Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;6:1925-1927.
Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol. 2018;36:996–1004.
Article CAS PubMed Google Scholar
Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010;38:e191.
Article PubMed PubMed Central Google Scholar
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119.
Article PubMed PubMed Central Google Scholar
Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 2010;38:e132.
Article PubMed PubMed Central Google Scholar
Buchfink B, **e C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60.
Article CAS PubMed Google Scholar
Brown CL, Keenum IM, Dai D, Zhang L, Vikesland PJ, Pruden A. Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes. Sci Rep. 2021;11:3753.
Article CAS PubMed PubMed Central Google Scholar
Xu G, Zhang L, Liu X, Guan F, Xu Y, Yue H, et al. Combined assembly of long and short sequencing reads improve the efficiency of exploring the soil metagenome. BMC Genomics. 2022;23:37.
Article CAS PubMed PubMed Central Google Scholar
Stewart RD, Auffret MD, Warr A, Walker AW, Roehe R, Watson M. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat Biotechnol. 2019;37:953–61.
Article CAS PubMed PubMed Central Google Scholar
The Genome Standards Consortium, Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35:725–31.
Article Google Scholar
Konstantinidis KT, Viver T, Conrad RE, Venter SN, Rossello-Mora R. Solar salterns as model systems to study the units of bacterial diversity that matter for ecosystem functioning. Curr Opin Biotechnol. 2022;73:151–7.
Article CAS PubMed Google Scholar
Browne PD, Nielsen TK, Kot W, Aggerholm A, Gilbert MTP, Puetz L, et al. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. GigaScience. 2020;9:giaa008.
Article CAS PubMed PubMed Central Google Scholar
Goldstein S, Beka L, Graf J, Klassen JL. Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Genomics. 2019;20:23.
Article PubMed PubMed Central Google Scholar
Frank JA, Pan Y, Tooming-Klunderud A, Eijsink VGH, McHardy AC, Nederbragt AJ, et al. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci Rep. 2016;6:25373.
Article CAS PubMed PubMed Central Google Scholar
Tao Y, Xun F, Zhao C, Mao Z, Li B, **ng P, et al. Improved assembly of metagenome-assembled genomes and viruses in Tibetan saline lake sediment by HiFi metagenomic sequencing. Liu J, editor. Microbiol Spectr. 2023;11:e03328-22.
Article PubMed Google Scholar
Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing. Ouellette F, editor. PLoS Comput Biol. 2023;19:e1010905.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Fengqing Wang and the Helgoland crew for their help collecting samples and Bruno Huettel for technical support for sequencing. We thank Luis Miguel Rodriguez for his help in interpreting Nonpareil results. We also thank Isabella Wilkie and Alissa Hooker for their feedback on the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL. The Max Planck Society funded this study.

Author information

Authors and Affiliations

Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Celsiusstraße 1, Bremen, 28359, Germany
Luis H. Orellana, Karen Krüger, Chandni Sidhu & Rudolf Amann

Authors

Luis H. Orellana
View author publications
You can also search for this author in PubMed Google Scholar
Karen Krüger
View author publications
You can also search for this author in PubMed Google Scholar
Chandni Sidhu
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Amann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.H.O. and R.A. directed this study. K.K. recovered MAGs from short-read and long-read metagenomes. C.S. produced the assemblies and uploaded all sequence data. L.H.O. designed and performed all metagenomic and MAG comparisons. L.H.O. prepared the manuscript and received feedback from all authors. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Luis H. Orellana.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Average coverage and sequence diversity for SR and LR metagenomic samples. a. The estimated abundance-weighted average community coverage as determined in Nonpareil for each SR and LR metagenomic sample. For LR metagenomes, we first selected ~250 bp fragments from each LR and then used them to generate a model. Coverage was predicted from generated models and the original sequencing effort b. Sequence diversity (total diversity; N_d) as defined in Nonpareil. Figure S2. Read overlap** between short- and long-read metagenomic samples. The bars show the map** of SR on LR for each time point. The dark shades indicate the mapped fractions, and the light shades show unmapped SR and LR. Figure S3. Assembly statistics for contigs generated using short- and long-read metagenomic samples. a. N statistics for contigs generated using SR and LR metagenomic samples. b. Distribution of sequencing depth (x-axis) for contigs generated using SR and LR metagenomic samples. Figure S4. Distribution of predicted protein lengths using different gene prediction tools vs. best hit match in UniProt TrEMBL. a Only predicted proteins >= 100 amino acids were used for all comparisons. The boxplots show the quotients between the length of predicted proteins using FragGeneScan (FGS), MetaGenemark, and Prodigal and the best match in UniProt TrEMBL for the unassembled long-reads of the 2020-03-10 sample. b. Distribution of predicted protein lengths from contigs vs. best match in UniProt TrEMBL for the 2020.03.10 LR sample. Figure S5. Statistics for pairs of MAGs recovered from SR and LR metagenomes. a. Difference between the number of predicted genes in MAG pairs recovered in SR and LR metagenomic samples. The boxplot in the lower right corner summarizes the comparison of predicted genes for MAG pairs. b. GC content for pairs of SR and LR MAGs colored according to their class taxonomic affiliation. The right side of the plot shows histograms for the distribution of GC content values of MAGs belonging to Bacteroidia and Gammaproteobacteria class levels. The thick black line depicts the median value of the distribution. c,d. Relative abundance for all MAGs recovered (c) and pairs of 99% ANI MAGs (d). Colors represent the taxonomic affiliation of MAGs according to GTDB-tk. Figure S6. Relationship between relative abundances of MAG pairs in SR and LR metagenomes for each time point. Figure S7. Comparison of the breadth of the coverage vs. sequencing depth for cross-map** of reads. The figures summarize the map** of (a) short-reads on long-read derived MAGs and (b) long-reads on short-read MAGs. Uniquely detected species are colored according to the inferred taxonomy. The rest of the points represent MAG species detected in both technologies. The dotted purple line represents the expected breadth of coverage for a given level of sequencing depth, according to Lander and Waterman (1998). The letters indicate the novelty taxonomic level for each of the MAGs (p=phylum, c=class, o=order, f=family, g=genus, and the species level is omitted for clarity).

Additional file 2: Table S1.

General sequence statistics for unassembled short- and long-read metagenomic samples. Table S2. General statistics for assembled reads. Summary for assemblies using 500 bp (a) and 2,500 bp (b) contig length cutoffs. Assembly statistics for the hybrid assembly approach (c). Table S3. List of generated MAGs. Names and general taxonomic classification for the MAGs used in this work. Accession numbers (ENA), completion, and contamination values for each MAG are also provided.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Orellana, L.H., Krüger, K., Sidhu, C. et al. Comparing genomes recovered from time-series metagenomes using long- and short-read sequencing technologies. Microbiome 11, 105 (2023). https://doi.org/10.1186/s40168-023-01557-3

Download citation

Received: 08 December 2022
Accepted: 26 April 2023
Published: 13 May 2023
DOI: https://doi.org/10.1186/s40168-023-01557-3

Comparing genomes recovered from time-series metagenomes using long- and short-read sequencing technologies