open access publication

Article, 2024

Differentiation of COVID‐19 pneumonia from other lung diseases using CT radiomic features and machine learning: A large multicentric cohort study

International Journal of Imaging Systems and Technology, ISSN 1098-1098, 0899-9457, Volume 34, 2, 10.1002/ima.23028

Contributors

Shiri, Isaac 0000-0002-5735-0736 [1] Salimi, Yazdan 0000-0002-1233-9576 [1] Saberi, Abdollah 0000-0001-7327-2558 [1] Pakbin, Masoumeh 0000-0001-7643-5877 [2] Hajianfar, Ghasem 0000-0001-5359-2407 [1] Avval, Atlas Haddadi 0000-0002-3896-7810 [3] Sanaat, Amir Hossein [1] Akhavanallaf, Azadeh 0000-0002-1486-4702 [1] Mostafaei, Shayan 0000-0002-1966-1306 [4] Mansouri, Zahra [1] Askari, Dariush 0000-0003-4031-2589 [5] Ghasemian, Mohammadreza [2] Sharifipour, Ehsan 0000-0002-5793-3288 [2] Sandoughdaran, Saleh 0000-0002-2191-7139 [6] Sohrabi, Ahmad [7] Sadati, Elham [8] Livani, Somayeh 0000-0002-5748-4208 [9] Iranpour, Pooya 0000-0001-6652-2053 [10] Kolahi, Shahriar 0000-0002-7490-1229 [11] Khosravi, Bardia 0000-0002-8024-339X [12] Khateri, Maziar 0000-0003-1951-2316 [13] Bijari, Salar 0000-0001-7656-0475 [8] Atashzar, Mohammad Reza [14] Shayesteh, Sajad Pashutan 0000-0003-4122-0053 [12] Babaei, Mohammad Reza 0000-0001-9279-9718 [15] Jenabi, Elnaz [12] Hasanian, Mohammad 0000-0002-3349-8090 [16] Shahhamzeh, Alireza [2] Ghomi, Seyed Yaser Foroghi 0000-0002-1555-2241 [2] Mozafari, Abolfazl 0000-0001-8666-4622 [17] Shirzad-Aski, Hesamaddin 0000-0002-0773-1610 [9] Movaseghi, Fatemeh [17] Bozorgmehr, Rama 0000-0003-4221-0316 [5] Goharpey, Neda [5] Abdollahi, Hamid [18] [19] Geramifar, Parham 0000-0002-7607-6859 [12] Radmard, Amir Reza 0000-0002-7462-118X [12] [20] Arabi, Hossein [1] Rezaei-Kalantari, Kiara 0000-0003-1973-4760 [15] Oveisi, Mehrdad 0000-0002-8100-5609 [18] [21] Rahmim, Arman 0000-0002-9980-2403 [18] [19] Zaidi, Habib 0000-0001-7559-5297 (Corresponding author) [1] [22] [23] [24]

Affiliations

  1. [1] University Hospital of Geneva
  2. [NORA names: Switzerland; Europe, Non-EU; OECD];
  3. [2] Qom University of Medical Science and Health Services
  4. [NORA names: Iran; Asia, Middle East];
  5. [3] Mashhad University of Medical Sciences
  6. [NORA names: Iran; Asia, Middle East];
  7. [4] Karolinska Institutet
  8. [NORA names: Sweden; Europe, EU; Nordic; OECD];
  9. [5] Shahid Beheshti University of Medical Sciences
  10. [NORA names: Iran; Asia, Middle East];

Abstract

Abstract To derive and validate an effective machine learning and radiomics‐based model to differentiate COVID‐19 pneumonia from other lung diseases using a large multi‐centric dataset. In this retrospective study, we collected 19 private and five public datasets of chest CT images, accumulating to 26 307 images (15 148 COVID‐19; 9657 other lung diseases including non‐COVID‐19 pneumonia, lung cancer, pulmonary embolism; 1502 normal cases). We tested 96 machine learning‐based models by cross‐combining four feature selectors (FSs) and eight dimensionality reduction techniques with eight classifiers. We trained and evaluated our models using three different strategies: #1, the whole dataset (15 148 COVID‐19 and 11 159 other); #2, a new dataset after excluding healthy individuals and COVID‐19 patients who did not have RT‐PCR results (12 419 COVID‐19 and 8278 other); and #3 only non‐COVID‐19 pneumonia patients and a random sample of COVID‐19 patients (3000 COVID‐19 and 2582 others) to provide balanced classes. The best models were chosen by one‐standard‐deviation rule in 10‐fold cross‐validation and evaluated on the hold out test sets for reporting. In strategy#1, Relief FS combined with random forest (RF) classifier resulted in the highest performance (accuracy = 0.96, AUC = 0.99, sensitivity = 0.98, specificity = 0.94, PPV = 0.96, and NPV = 0.96). In strategy#2, Recursive Feature Elimination (RFE) FS and RF classifier combination resulted in the highest performance (accuracy = 0.97, AUC = 0.99, sensitivity = 0.98, specificity = 0.95, PPV = 0.96, NPV = 0.98). Finally, in strategy #3, the ANOVA FS and RF classifier combination resulted in the highest performance (accuracy = 0.94, AUC =0.98, sensitivity = 0.96, specificity = 0.93, PPV = 0.93, NPV = 0.96). Lung radiomic features combined with machine learning algorithms can enable the effective diagnosis of COVID‐19 pneumonia in CT images without the use of additional tests.

Keywords

ANOVA, Abstract, COVID-19, COVID-19 patients, COVID-19 pneumonia, CT images, CT radiomics features, FS, RT-PCR, RT-PCR results, algorithm, balanced classes, chest CT images, class, classifier, classifier combination, cohort study, combination, cross combinations, cross-validation, dataset, diagnosis of COVID-19 pneumonia, differentiation, dimensionality, dimensionality reduction techniques, disease, effective diagnosis, effective machine learning, elimination, feature elimination, features, forest, healthy individuals, high performance, images, individuals, learning, learning algorithms, learning-based models, lung, lung disease, lung radiomics features, machine, machine learning, machine learning algorithms, machine learning-based models, model, multi-centre dataset, multicentre cohort study, non-COVID-19, non-COVID-19 pneumonia patients, one-standard-deviation, patients, performance, pneumonia, pneumonia patients, public datasets, radiomic features, radiomics-based model, random forest, random sample, recursion, recursive feature elimination, reduction techniques, relief, reports, results, retrospective study, rules, samples of COVID-19 patients, selector, sets, strategies, study, technique, test, test set

Funders

  • Swiss National Science Foundation

Data Provider: Digital Science