Pocket-Aware Molecular Generation Through Learned Protein Representations

dc.contributor.advisorIgual Muñoz, Laura
dc.contributor.authorValverde Sánchez, Claudia
dc.date.accessioned2025-09-23T10:41:53Z
dc.date.available2025-09-23T10:41:53Z
dc.date.issued2025-06-30
dc.descriptionTreballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Any: 2025. Tutor: Laura Igual Muñoz i Alexis Molinaca
dc.description.abstractDrug discovery is constrained not only by the immense chemical space but by the difficulty of efficiently exploring it and the high cost of traditional screening methods. This thesis introduces and evaluates a deep learning (DL) strategy for the de novo generation of small molecules designed to bind specific protein pockets, aiming to accelerate the identification of novel drug candidates. Our approach leverages pre-trained protein and pocket embeddings within a decoder-only Transformer architecture that learns to translate complex biological information into SMILES strings. Given the early stage of conditional binder generation, this work emphasizes systematic experimentation and thorough performance evaluation. We explored various protein and pocket representation strategies, including global protein (ESM2), structural-aware protein (SaProt), pocket-specific (PickPocket), and integrated Drug-Target Interaction (TensorDTI) embeddings. Our comprehensive evaluation pipeline assessed molecule validity, novelty, internal and cross-model diversity, physicochemical properties, and predicted drug-target interactions. Key findings include demonstrating that a high proportion of viral proteins in the training data does not bias generation, and that different input representations guide the model to explore distinct chemical spaces. While the models effectively generate diverse molecules with favorable drug-like properties, a notable limitation is their propensity to produce exact matches to the training set, indicating overfitting. Furthermore, despite the model’s sensitivity to pocket information, case studies of two specific kinase proteins revealed a challenge in consistently generating truly pocket-specific molecules, likely because of data set characteristics such as promiscuous motifs. This work provides valuable insights into the capabilities and current limitations of pocket-aware generative models, laying a foundation for future advancements in targeted molecule design.ca
dc.format.extent69 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/223356
dc.language.isoengca
dc.rightscc-by-nc-nd (c) Claudia Valverde Sánchez, 2025
dc.rightscodi: GPL (c) Claudia Valverde Sánchez, 2025
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.rights.urihttp://www.gnu.org/licenses/gpl-3.0.ca.html*
dc.sourceMàster Oficial - Fonaments de la Ciència de Dades
dc.subject.classificationAprenentatge profund
dc.subject.classificationDisseny de medicaments
dc.subject.classificationEnginyeria de proteïnes
dc.subject.classificationTreballs de fi de màster
dc.subject.otherDeep learning (Machine learning)
dc.subject.otherDrug design
dc.subject.otherProtein engineering
dc.subject.otherMaster's thesis
dc.titlePocket-Aware Molecular Generation Through Learned Protein Representationsca
dc.typeinfo:eu-repo/semantics/masterThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
TFM_Claudia_Valverde_Sanchez.pdf
Mida:
9.86 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
TFM-main.zip
Mida:
12.62 MB
Format:
ZIP file
Descripció:
Codi font