Carregant...
Tipus de document
Treball de fi de grauData de publicació
Llicència de publicació
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/213150
Exploring machine learning approaches for phenotype prediction of Huntington's disease
Títol de la revista
Autors
Director/Tutor
ISSN de la revista
Títol del volum
Recurs relacionat
Resum
Huntington’s disease onset of symptoms is clinically predicted primarily using the length
of the CAG trinucleotide expansion in the HTT gene. However, this prediction can
only explain around 50% of the variability of the phenotype. It is estimated that
40% of the remaining variability is heritable, suggesting the presence of other genetic
factors. Genome Wide Association Studies (GWAS) have identified potential genetic
modifiers, although only through the revelation of linear effects and via computationally
demanding processes.
This project benchmarks various machine learning algorithms trained with an Enroll-
HD GWAS dataset to predict the age at HD onset. The dataset comprises the genotype
of millions of SNPs from approximately 9,000 individuals. The models considered
include regularized linear models (Lasso and Elastic Net) and tree-based models (Random
Forest and XGBoost), and their predictive power is compared to an Ordinary
Least Squares baseline model trained solely with sex and CAG as covariates. The results
indicate that tree-based models achieve the best estimation of age of onset (AO),
improving the prediction by 3% with respect to the baseline, possibly due to their implicit
consideration of interactions between SNPs. For each model, we extract the most
significant features contributing to the model, thereby identifying genetic modifiers.
Some of these key SNPs are in well-known AO modifier candidates such as FAN1 and
MYT1L, while others are in genes like CDYL2 proposed as new candidates.
Descripció
Treballs Finals de Grau d'Enginyeria Biomèdica. Facultat de Medicina i Ciències de la Salut. Universitat de Barcelona. Curs: 2023-2024. Tutor/Director: Josep Maria Canals Coll ; Director: Jordi Abante Llenas
Matèries (anglès)
Citació
Citació
FUSES I KUZMINA, Caterina. Exploring machine learning approaches for phenotype prediction of Huntington's disease. [consulta: 25 de gener de 2026]. [Disponible a: https://hdl.handle.net/2445/213150]