Please use this identifier to cite or link to this item:
https://hdl.handle.net/2445/223356
Title: | Pocket-Aware Molecular Generation Through Learned Protein Representations |
Author: | Valverde Sánchez, Claudia |
Director/Tutor: | Igual Muñoz, Laura |
Keywords: | Aprenentatge profund Disseny de medicaments Enginyeria de proteïnes Treballs de fi de màster Deep learning (Machine learning) Drug design Protein engineering Master's thesis |
Issue Date: | 30-Jun-2025 |
Abstract: | Drug discovery is constrained not only by the immense chemical space but by the difficulty of efficiently exploring it and the high cost of traditional screening methods. This thesis introduces and evaluates a deep learning (DL) strategy for the de novo generation of small molecules designed to bind specific protein pockets, aiming to accelerate the identification of novel drug candidates. Our approach leverages pre-trained protein and pocket embeddings within a decoder-only Transformer architecture that learns to translate complex biological information into SMILES strings. Given the early stage of conditional binder generation, this work emphasizes systematic experimentation and thorough performance evaluation. We explored various protein and pocket representation strategies, including global protein (ESM2), structural-aware protein (SaProt), pocket-specific (PickPocket), and integrated Drug-Target Interaction (TensorDTI) embeddings. Our comprehensive evaluation pipeline assessed molecule validity, novelty, internal and cross-model diversity, physicochemical properties, and predicted drug-target interactions. Key findings include demonstrating that a high proportion of viral proteins in the training data does not bias generation, and that different input representations guide the model to explore distinct chemical spaces. While the models effectively generate diverse molecules with favorable drug-like properties, a notable limitation is their propensity to produce exact matches to the training set, indicating overfitting. Furthermore, despite the model’s sensitivity to pocket information, case studies of two specific kinase proteins revealed a challenge in consistently generating truly pocket-specific molecules, likely because of data set characteristics such as promiscuous motifs. This work provides valuable insights into the capabilities and current limitations of pocket-aware generative models, laying a foundation for future advancements in targeted molecule design. |
Note: | Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Any: 2025. Tutor: Laura Igual Muñoz i Alexis Molina |
URI: | https://hdl.handle.net/2445/223356 |
Appears in Collections: | Màster Oficial - Fonaments de la Ciència de Dades Programari - Treballs de l'alumnat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
TFM_Claudia_Valverde_Sanchez.pdf | Memòria | 10.09 MB | Adobe PDF | View/Open |
TFM-main.zip | Codi font | 12.92 MB | zip | View/Open |
This item is licensed under a
Creative Commons License