Preprint,
Estimating gene conversion tract length and rate from PacBio HiFi data
Affiliations
- [1] Aarhus University [NORA names: AU Aarhus University; University; Denmark; Europe, EU; Nordic; OECD];
- [2] Department of Molecular Medicine (MOMA), Brendstrupgårdsvej 21A, 8200 Aarhus N, Denmark [NORA names: Denmark; Europe, EU; Nordic; OECD];
- [3] Copenhagen University Hospital [NORA names: Capital Region of Denmark; Hospital; Denmark; Europe, EU; Nordic; OECD]
Abstract
Abstract Gene conversions are broadly defined as the transfer of genetic material from a ‘donor’ to an ‘acceptor’ sequence and can happen both in meiosis and mitosis. They are a subset of non-crossover events and like crossover events, gene conversion can generate new combinations of alleles, erode linkage disequilibrium, and even counteract the mutation load by reverting germline mutations through GC-biased gene conversion. Estimating the rate of gene conversion and the distribution of gene conversion tract lengths remains challenging. Here, we present a new method for estimating tract length, rate and detection probability of non-crossover events directly in HiFi PacBio long read data. The method can be applied with data from a single individual, is unbiased even under low single nucleotide variant densities and does not necessitate any demographic or evolutionary assumptions. We apply the method to gene conversion events observed directly in Pacbio HiFI read data from a human sperm sample and find that human gene conversion tracts are shorter (mean of 50 base pairs) than estimates from yeast or Drosophila . We also estimate that typical human male gametes undergo on average 280 non-crossover events where approximately 7 are expected to become visible as gene conversions moving variants from one donor haplotype to an acceptor haplotype.