open access publication

Preprint, 2024

nanoBERT: A deep learning model for gene agnostic navigation of the nanobody mutational space

bioRxiv, Page 2024.01.31.578143, 10.1101/2024.01.31.578143

Contributors

Hadsund, Johannes Thorling [1] Satława, Tadeusz [2] Janusz, Bartosz [2] Shan, Lu [3] Zhou, Li [3] Röttger, Richard [1] Krawczyk, Konrad 0000-0003-0697-5522 (Corresponding author) [2]

Affiliations

  1. [1] University of Southern Denmark
  2. [NORA names: SDU University of Southern Denmark; University; Denmark; Europe, EU; Nordic; OECD];
  3. [2] NaturalAntibody, Szczecin, Poland
  4. [NORA names: Poland; Europe, EU; OECD];
  5. [3] Alector Therapeutics, 131 Oyster Point Blvd, Suite 600 South San Francisco, CA 94080
  6. [NORA names: United States; America, North; OECD]

Abstract

Abstract Nanobodies are a subclass of immunoglobulins, whose binding site consists of only one peptide chain, bestowing favorable biophysical properties. Recently, the first nanobody therapy was approved, paving the way for further clinical applications of this antibody format. Further development of nanobody-based therapeutics could be streamlined by computational methods. One of such methods is infilling - positional prediction of biologically feasible mutations in nanobodies. Being able to identify possible positional substitutions based on sequence context, facilitates functional design of such molecules. Here we present nanoBERT, a nanobody-specific transformer to predict amino acids in a given position in a query sequence. We demonstrate the need to develop such machine-learning based protocol as opposed to gene-specific positional statistics since appropriate genetic reference is not available. We benchmark nanoBERT with respect to human-based language models and ESM-2, demonstrating the benefit for domain-specific language models. We also demonstrate the benefit of employing nanobody-specific predictions for fine-tuning on experimentally measured thermostability dataset. We hope that nanoBERT will help engineers in a range of predictive tasks for designing therapeutic nanobodies. Availability https://huggingface.co/NaturalAntibody/

Keywords

Abstract, acid, amino, amino acids, antibodies, antibody formation, applications, beings, benefits, binding sites, biophysical properties, chain, clinical application, computational methods, context, dataset, deep learning models, design, development, domain-specific language models, engineering, experimentation, formation, functional design, genes, genetic reference, immunoglobulin, language model, learning models, machine-learning, method, model, molecules, mutation space, mutations, nanobodies, navigation, peptide, peptide chain, position, position statistics, positional substitution, prediction, prediction task, properties, protocol, query, query sequence, reference, sequence, sequence context, sites, space, statistically, subclass, subclasses of immunoglobulins, substitution, task, therapeutic nanobodies, therapeutics, therapy, transformation

Data Provider: Digital Science