open access publication

Article, 2023

Prediction of pancreatic cancer risk in patients with new-onset diabetes using a machine learning approach based on routine biochemical parameters

Computer Methods and Programs in Biomedicine, ISSN 1872-7565, 0169-2607, Volume 244, Page 107965, 10.1016/j.cmpb.2023.107965

Contributors

Cichosz, Simon Lebech 0000-0002-3484-7571 (Corresponding author) [1] Jensen, Morten Hasselstrøm 0000-0002-6649-8644 [1] Hejlesen, Ole Kristian 0000-0003-3578-8750 [1] Henriksen, Stine Dam 0000-0003-0286-397X [2] Drewes, Asbjørn Mohr 0000-0001-7465-964X [2] Olesen, Søren Schou 0000-0003-3916-3168 [2]

Affiliations

  1. [1] Aalborg University
  2. [NORA names: AAU Aalborg University; University; Denmark; Europe, EU; Nordic; OECD];
  3. [2] Aalborg University Hospital
  4. [NORA names: North Denmark Region; Hospital; Denmark; Europe, EU; Nordic; OECD]

Abstract

OBJECTIVE: To develop a machine-learning model that can predict the risk of pancreatic ductal adenocarcinoma (PDAC) in people with new-onset diabetes (NOD). METHODS: From a population-based sample of individuals with NOD aged >50 years, patients with pancreatic cancer-related diabetes (PCRD), defined as NOD followed by a PDAC diagnosis within 3 years, were included (n = 716). These PCRD patients were randomly matched in a 1:1 ratio with individuals having NOD. Data from Danish national health registries were used to develop a random forest model to distinguish PCRD from Type 2 diabetes. The model was based on age, gender, and parameters derived from feature engineering on trajectories of routine biochemical variables. Model performance was evaluated using receiver operating characteristic curves (ROC) and relative risk scores. RESULTS: The most discriminative model included 20 features and achieved a ROC-AUC of 0.78 (CI:0.75-0.83). Compared to the general NOD population, the relative risk for PCRD was 20-fold increase for the 1 % of patients predicted by the model to have the highest cancer risk (3-year cancer risk of 12 % and sensitivity of 20 %). Age was the most discriminative single feature, followed by the rate of change in haemoglobin A1c and the latest plasma triglyceride level. When the prediction model was restricted to patients with PDAC diagnosed six months after diabetes diagnosis, the ROC-AUC was 0.74 (CI:0.69-0.79). CONCLUSION: In a population-based setting, a machine-learning model utilising information on age, sex and trajectories of routine biochemical variables demonstrated good discriminative ability between PCRD and Type 2 diabetes.

Keywords

A1c, Danish national health registries, ROC-AUC, ability, adenocarcinoma, age, approach, biochemical parameters, biochemical variables, cancer risk, characteristic curve, curves, data, diabetes, diabetes diagnosis, diagnosis, discriminant model, discrimination ability, ductal adenocarcinoma, engineering, features, forest model, gender, health registries, hemoglobin, hemoglobin A1c, higher cancer risk, increase, individuals, information, learning approach, levels, machine, machine learning approach, machine-learning models, model, model performance, months, national health registries, new-onset diabetes, pancreatic cancer risk, pancreatic ductal adenocarcinoma, pancreatic ductal adenocarcinoma diagnosis, parameters, patients, people, performance, plasma, plasma triglyceride levels, population, population-based sample, population-based sample of individuals, population-based setting, prediction, prediction model, random forest model, rate, ratio, receiver operating characteristic curve, registry, relative risk, relative risk score, risk, risk of pancreatic ductal adenocarcinoma, risk score, routine biochemical parameters, routine biochemical variables, sample of individuals, scores, sets, sex, trajectory, triglyceride levels, type, type 2 diabetes, variables, years

Data Provider: Digital Science