open access publication

Article, 2024

Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns

Scientific Reports, ISSN 2045-2322, Volume 14, 1, Page 8733, 10.1038/s41598-024-59620-4

Contributors

Vik, Daniel 0000-0003-1999-0369 (Corresponding author) [1] Pii, David 0009-0006-3855-0085 [1] Mudaliar, Chirag 0009-0007-2368-9700 [1] Nørregaard-Madsen, Mads [1] Kontijevskis, Aleksejs [1]

Affiliations

  1. [1] Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
  2. [NORA names: Denmark; Europe, EU; Nordic; OECD]

Abstract

This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.

Keywords

ChemProp, DeepChem, METLIN, XGBoost, accurate prediction, analysis, analysis of small molecules, average error, campaign, chemical, chemical series, chromatographic analysis, chromatographic retention times, data, dataset, discovery campaigns, discovery projects, drug discovery campaigns, drug discovery projects, efficacy, efficiency, error, findings, framework, graph neural networks, high-throughput parallel synthesis, machine-learning, machine-learning framework, machine-learning models, model, molecules, network, neural network, objective, parallel synthesis, performance, platform, predicted RT, prediction, prediction scenarios, production platform, project, results, retention time, retention time prediction, robustness, scenarios, series, small molecules, study, synthesis, time, time prediction

Data Provider: Digital Science