Carregant...
Miniatura

Tipus de document

Treball de fi de màster

Data de publicació

Llicència de publicació

cc-by-nc-nd (c) Georgia Zavou, 2025
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/223375

Augmenting phenotype prediction models leveraging a genomic Large Language Model

Títol de la revista

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

Huntington’s disease (HD) is a progressive neurodegenerative disorder caused by CAG repeat expansion in the HTT gene. While the length of this expansion explains a large portion of the variability in age of onset (AO), additional genetic modifiers, including regulatory variants, contribute to the remaining variability. In this work, we investigate the utility of genomic language models (gLMs), specifically Borzoi, for predicting tissue-specific gene expression changes from individual genomic data. We applied Borzoi to whole-genome sequencing data and inte- grated RNA-seq coverage predictions for relevant brain regions, including putamen and caudate. After weighting logSED scores using enhancer proximity, we aggregated these expression predictions at the gene level. We then trained multiple machine learning models to classify AO residuals such as a baseline XGBoost model using coding SNPs, CAG repeat length, and sex, an expression-based model using Borzoi-derived features and a multimodal model combining both genomic and predicted expression features. Our results show that Borzoi expression predictions capture meaningful regulatory signals, with functional enrichment analysis highlighting genes involved in transcription regulation, DNA repair, and glutamate signaling. While genotype-based models achieved the highest predictive performance, the multimodal model demonstrated complementary information from expression-based features. This study illustrates the potential of incorporating gLM-based expression predictions into phenotype modeling, offering insights into HD molecular mechanisms and genetic modifiers. The corresponding notebooks and scripts for this thesis, can be found in the following GitHub Repository: https://github.com/gzavou/FPDS_Thesis

Descripció

Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Any: 2025. Tutor: Jordi Abante Llenas

Citació

Citació

ZAVOU, Georgia. Augmenting phenotype prediction models leveraging a genomic Large Language Model. [consulta: 8 de febrer de 2026]. [Disponible a: https://hdl.handle.net/2445/223375]

Exportar metadades

JSON - METS

Compartir registre