open access publication

Article, 2024

A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics

Microbiology Spectrum, ISSN 2165-0497, Volume 12, 4, Pages e03590-23, 10.1128/spectrum.03590-23

Contributors

Eisenhofer, Raphael 0000-0002-3843-0749 (Corresponding author) [1] Nesme, Joseph 0000-0003-1929-5040 [1] Santos-Bay, Luisa [1] Koziol, Adam 0000-0002-7721-4790 [1] Sørensen, Søren Johannes 0000-0001-6227-9906 [1] Alberdi, Antton 0000-0002-2875-6446 [1] Aizpurua, Ostaizka 0000-0001-8053-3672 [1]

Affiliations

  1. [1] University of Copenhagen
  2. [NORA names: KU University of Copenhagen; University; Denmark; Europe, EU; Nordic; OECD]

Abstract

Shotgun metagenomics enables the reconstruction of complex microbial communities at a high level of detail. Such an approach can be conducted using both short-read and long-read sequencing data, as well as a combination of both. To assess the pros and cons of these different approaches, we used 22 fecal DNA extracts collected weekly for 11 weeks from two respective lab mice to study seven performance metrics over four combinations of sequencing depth and technology: (i) 20 Gbp of Illumina short-read data, (ii) 40 Gbp of short-read data, (iii) 20 Gbp of PacBio HiFi long-read data, and (iv) 40 Gbp of hybrid (20 Gbp of short-read +20 Gbp of long-read) data. No strategy was best for all metrics; instead, each one excelled across different metrics. The long-read approach yielded the best assembly statistics, with the highest N50 and lowest number of contigs. The 40 Gbp short-read approach yielded the highest number of refined bins. Finally, the hybrid approach yielded the longest assemblies and the highest mapping rate to the bacterial genomes. Our results suggest that while long-read sequencing significantly improves the quality of reconstructed bacterial genomes, it is more expensive and requires deeper sequencing than short-read approaches to recover a comparable amount of reconstructed genomes. The most optimal strategy is study-specific and depends on how researchers assess the trade-off between the quantity and quality of recovered genomes.IMPORTANCEMice are an important model organism for understanding the gut microbiome. When studying these gut microbiomes using DNA techniques, researchers can choose from technologies that use short or long DNA reads. In this study, we perform an extensive benchmark between short- and long-read DNA sequencing for studying mice gut microbiomes. We find that no one approach was best for all metrics and provide information that can help guide researchers in planning their experiments.

Keywords

Con, DNA, DNA extraction, DNA reads, DNA sequences, DNA techniques, HIFI, Illumina, Illumina short-read data, N50, PacBio, amount, approach, assembly, assembly statistics, bacterial genomes, benchmarks, bins, combination, community, comparison, complex microbial communities, contigs, data, depth, experiments, extraction, fecal DNA extracts, genome, genome-resolved metagenomics, gut, gut microbiome, higher N50, highest mapping rate, information, lab, lab mice, levels, long DNA reads, long reads, long-read DNA sequencing, long-read approaches, long-read data, long-read sequencing, long-read sequencing data, longer assemblies, mapping rate, metagenomics, metrics, mice, microbial communities, microbiome, model, model organisms, mouse gut microbiome, optimal strategy, organization, performance, performance metrics, pros, quality, quantity, rate, reading, reconstruction, recovered genomes, research, sequence, sequence data, sequencing depth, short reads, short-, short-read approaches, short-read data, shotgun, shotgun metagenomics, statistically, strategies, study, study-specific, technique, technology, weeks

Funders

  • Danish National Research Foundation
  • Carlsberg Foundation

Data Provider: Digital Science