Carregant...
Miniatura

Tipus de document

Treball de fi de màster

Data de publicació

Llicència de publicació

cc-by-nc-nd (c) Claudia Valverde Sánchez, 2025
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/223356

Pocket-Aware Molecular Generation Through Learned Protein Representations

Títol de la revista

Director/Tutor

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

Drug discovery is constrained not only by the immense chemical space but by the difficulty of efficiently exploring it and the high cost of traditional screening methods. This thesis introduces and evaluates a deep learning (DL) strategy for the de novo generation of small molecules designed to bind specific protein pockets, aiming to accelerate the identification of novel drug candidates. Our approach leverages pre-trained protein and pocket embeddings within a decoder-only Transformer architecture that learns to translate complex biological information into SMILES strings. Given the early stage of conditional binder generation, this work emphasizes systematic experimentation and thorough performance evaluation. We explored various protein and pocket representation strategies, including global protein (ESM2), structural-aware protein (SaProt), pocket-specific (PickPocket), and integrated Drug-Target Interaction (TensorDTI) embeddings. Our comprehensive evaluation pipeline assessed molecule validity, novelty, internal and cross-model diversity, physicochemical properties, and predicted drug-target interactions. Key findings include demonstrating that a high proportion of viral proteins in the training data does not bias generation, and that different input representations guide the model to explore distinct chemical spaces. While the models effectively generate diverse molecules with favorable drug-like properties, a notable limitation is their propensity to produce exact matches to the training set, indicating overfitting. Furthermore, despite the model’s sensitivity to pocket information, case studies of two specific kinase proteins revealed a challenge in consistently generating truly pocket-specific molecules, likely because of data set characteristics such as promiscuous motifs. This work provides valuable insights into the capabilities and current limitations of pocket-aware generative models, laying a foundation for future advancements in targeted molecule design.

Descripció

Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Any: 2025. Tutor: Laura Igual Muñoz i Alexis Molina

Citació

Citació

VALVERDE SÁNCHEZ, Claudia. Pocket-Aware Molecular Generation Through Learned Protein Representations. [consulta: 10 de gener de 2026]. [Disponible a: https://hdl.handle.net/2445/223356]

Exportar metadades

JSON - METS

Compartir registre