Cardiometabolic risk estimation using exposome data and machine learning

dc.contributor.authorAtehortúa, Angélica
dc.contributor.authorGkontra, Polyxeni
dc.contributor.authorCamacho, Marina
dc.contributor.authorDíaz, Oliver
dc.contributor.authorBulgheroni, Maria
dc.contributor.authorSimonetti, Valentina
dc.contributor.authorChadeau-Hyam, Marc
dc.contributor.authorFelix, Janine F.
dc.contributor.authorSebert, Sylvain
dc.contributor.authorLekadir, Karim, 1977-
dc.date.accessioned2025-02-20T08:24:58Z
dc.date.available2025-02-20T08:24:58Z
dc.date.issued2023-11
dc.date.updated2025-02-20T08:24:59Z
dc.description.abstractBackground: The human exposome encompasses all exposures that individuals encounter throughout their lifetime. It is now widely acknowledged that health outcomes are influenced not only by genetic factors but also by the interactions between these factors and various exposures. Consequently, the exposome has emerged as a significant contributor to the overall risk of developing major diseases, such as cardiovascular disease (CVD) and diabetes. Therefore, personalized early risk assessment based on exposome attributes might be a promising tool for identifying high-risk individuals and improving disease prevention. Objective: Develop and evaluate a novel and fair machine learning (ML) model for CVD and type 2 diabetes (T2D) risk prediction based on a set of readily available exposome factors. We evaluated our model using internal and external validation groups from a multi-center cohort. To be considered fair, the model was required to demonstrate consistent performance across different subgroups of the cohort. Methods: From the UK Biobank, we identified 5,348 and 1,534 participants who within 13 years from the baseline visit were diagnosed with CVD and T2D, respectively. An equal number of participants who did not develop these pathologies were randomly selected as the control group. 109 readily available exposure variables from six different categories (physical measures, environmental, lifestyle, mental health events, sociodemographics, and early-life factors) from the participant’s baseline visit were considered. We adopted the XGBoost ensemble model to predict individuals at risk of developing the diseases. The model’s performance was compared to that of an integrative ML model which is based on a set of biological, clinical, physical, and sociodemographic variables, and, additionally for CVD, to the Framingham risk score. Moreover, we assessed the proposed model for potential bias related to sex, ethnicity, and age. Lastly, we interpreted the model’s results using SHAP, a state-of-the-art explainability method. Results: The proposed ML model presents a comparable performance to the integrative ML model despite using solely exposome information, achieving a ROC-AUC of 0.78 ± 0.01 and 0.77 ± 0.01 for CVD and T2D, respectively. Additionally, for CVD risk prediction, the exposome-based model presents an improved performance over the traditional Framingham risk score. No bias in terms of key sensitive variables was identified. Conclusions: We identified exposome factors that play an important role in identifying patients at risk of CVD and T2D, such as naps during the day, age completed full-time education, past tobacco smoking, frequency of tiredness/unenthusiasm, and current work status. Overall, this work demonstrates the potential of exposome-based machine learning as a fair CVD and T2D risk assessment tool.
dc.format.extent12 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec742387
dc.identifier.issn1386-5056
dc.identifier.urihttps://hdl.handle.net/2445/219023
dc.language.isoeng
dc.publisherElsevier B.V.
dc.relation.isformatofReproducció del document publicat a: https://doi.org/10.1016/j.ijmedinf.2023.105209
dc.relation.ispartofInternational Journal of Medical Informatics, 2023, vol. 179, p. 105209
dc.relation.urihttps://doi.org/10.1016/j.ijmedinf.2023.105209
dc.rightscc-by-nc-nd (c) Angélica Atehortúa et al., 2023
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.sourceArticles publicats en revistes (Matemàtiques i Informàtica)
dc.subject.classificationMalalties cardiovasculars
dc.subject.classificationDiabetis
dc.subject.classificationAprenentatge automàtic
dc.subject.otherCardiovascular diseases
dc.subject.otherDiabetes
dc.subject.otherMachine learning
dc.titleCardiometabolic risk estimation using exposome data and machine learning
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
841950.pdf
Mida:
2.3 MB
Format:
Adobe Portable Document Format