open access publication

Preprint, 2024

Matching Excellence: ONT’s Rise to Parity with PacBio in Genome Reconstruction of Non-Model Bacterium with High GC Content

bioRxiv, Page 2024.02.26.582104, 10.1101/2024.02.26.582104

Contributors

Soto-Serrano, Axel (Corresponding author) [1] Li, Wenwen [1] Panah, Farhad M 0000-0002-6623-1528 [1] Hui, Yan [1] Atienza, Pablo 0009-0002-1093-693X [1] Fomenkov, Alexey I 0000-0002-2463-5946 [2] Roberts, Richard John 0000-0002-4348-0169 [2] Deptula, Paulina 0000-0002-8854-3573 [1] Krych, Lukasz 0000-0003-3224-1346 [1]

Affiliations

  1. [1] University of Copenhagen
  2. [NORA names: KU University of Copenhagen; University; Denmark; Europe, EU; Nordic; OECD];
  3. [2] New England Biolabs (United States)
  4. [NORA names: United States; America, North; OECD]

Abstract

Abstract Reconstruction of complete bacterial genomes is a vital aspect of microbial research, as it provides complex information about genetic content, gene ontology, and regulation. It has become a domain of 3rd generation, long-read sequencing platforms, as short-read technologies can deliver mainly fragmented genomes. PacBio platform can provide high-quality complete genomes, yet remains one of the most expensive sequencing strategies. Oxford Nanopore Technology (ONT) offers the advantage of producing the longest reads, being at the same time the most cost-effective option in terms of platform costs, as well as library preparation, and sequencing. However, ONTs error rate, although significantly reduced lately, still holds a certain level of distrust in the scientific community. In recent years, hybrid assembly of Nanopore and Illumina data has been used to solve ONTs issue with error rate and has yielded the best results in terms of genome completeness, quality, and price. However, the latest advancements in Nanopore technology, including new flow cells (R10.4.1), new library preparation chemistry (V14) and duplex-mode, updated basecallers (Dorado v0.4.1), and the realization that sequencing in dark mode results in significantly increased throughput, have had a significant impact on the quality of generated data and, thus, the recovery of complete genomes by ONT sequencing alone. In this study, we compared the data generated by ONT using three sequencing strategies (Native barcoding, RAPID barcoding, and custom-developed: BARSEQ) against PacBio and Illumina (NextSeq) as well as Illumina-ONT hybrid data. For this purpose, we employed three strains of the actinobacteria Propionibacterium freudenreichii , whose genomes have been proven difficult to reconstruct due to high GC content, regions of repeated sequences and massive genome rearrangements. Our data indicate that DNA libraries prepared with the native barcoding kit, sequenced with V14 chemistry on R10.4.1 flow cell, and assembled with Flye resulted in the reconstruction of complete genomes of overall quality highly similar to that of genomes reconstructed with PacBio. The highest level of quality can be achieved by hybrid assembly of data from the Native barcoding kit complemented with data from custom-developed BARSEQ, both sequenced on R10.4.1 flow cell. In conclusion, our results demonstrate that ONT can be used as a cost-effective sequencing strategy, without the need for complementing with other sequencing technologies, for the reconstruction of complete genomes of the highest quality.

Keywords

Abstract, Abstract Reconstruction, BarSeq, DNA, DNA library, Flye, GC content, Gene Ontology, Illumina, Illumina data, Nanopore Technologies, NextSeq, Oxford, Oxford Nanopore Technologies, Oxford Nanopore Technologies sequencing, PacBio, PacBio platform, V14, assembly of data, bacterial genomes, bacterium, cells, chemistry, community, complete bacterial genomes, complete genome, completion, complex information, content, cost, cost-effective option, cost-effective sequencing strategy, data, distrust, domain, error, error rate, flow, flow cell, genes, genetic content, genome, genome completeness, genome reconstruction, genomic rearrangements, high GC content, high quality, higher levels, highest level of quality, hybrid, hybrid assembly, hybridization data, impact, increase throughput, information, issues, kit, level of distrust, level of quality, levels, library, library preparation, library preparation chemistry, long reads, long-read sequencing platforms, massive genomic rearrangements, microbial research, mode results, nanopores, ontology, options, overall quality, parity, platform, platform cost, preparation, preparative chemistry, price, quality, rate, reading, realization, rearrangement, reconstruction, recovery, region, regulation, research, results, rise, scientific community, sequence, sequencing platforms, sequencing strategy, sequencing technologies, short-read technologies, strain, strategies, study, technology, throughput, vital aspect, years

Data Provider: Digital Science