Exploring protein structure changes due to somatic mutations in cancer

Diéguez Docampo, Andrea

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/194743

Title:	Exploring protein structure changes due to somatic mutations in cancer
Author:	Diéguez Docampo, Andrea
Director/Tutor:	Gut, Ivo
Keywords:	Oncologia Genòmica Càncer de mama Mutació (Biologia) Oncology Genomics Breast cancer Mutation (Biology)
Issue Date:	2-Mar-2023
Publisher:	Universitat de Barcelona
Abstract:	[eng] Cancer is one of the most common diseases worldwide. Despite that a lot of time and resources have already been spent in this topic, there is still a long way to go to be able to cure every patient and improve their quality of life. To contribute to these efforts, we integrate and study a joint dataset of whole genome, whole exome and panel sequencing data from primary and metastatic tumours from 25,499 donors with different cancer types. This dataset consists out of four cohorts: the Pan-Cancer Analysis of Whole Genomes (PCAWG) dataset, the Hartwig Medical Foundation (HMF) dataset, The Cancer Genomes Atlas (TCGA) dataset and the Breast-CAncer STratification study (B-CAST) dataset. By describing mutations found in the individual cohorts and the joint dataset, we provide an overview of the genomic landscape across various cancer types. We also assess the landscape of mutational signatures in primary and metastatic tumours focused on breast, colorectal and uterus cancer and identify groups based on the dominant mutational signatures. We observe groups with the same dominant signature across all three cancer types, as well as differences between primary and metastatic tumours. To illustrate the importance of studying of the genomic landscape we take the PCAWG dataset as a use case and compute 42 genomic features based on either all or only the recurrent mutations. Using these features, we are able to divide the dataset in biologically relevant clusters. Studying recurrent mutations also reveals susceptible sequence motifs, including TT[C>A]TTT and AAC[T>G]T for the POLE and ‘gastric-acid exposure’ clusters, respectively. To go beyond the genomic landscape, we focus on the mutations that results in an amino acid change in the protein and characterize these protein changes with a combination of amino acid, evolutionary and structural properties. We provide an overview of the amino acid changes observed within breast cancer specifically. In our joint dataset, one of the most mutated genes in breast cancer is PIK3CA, which is also frequently mutated in colorectal and uterus cancer. The comparison of the protein changes in p110α protein, encoded by PIK3CA, and their protein features across these cancer types elucidates differences in the proportion of mutations across the different protein domains. Deciphering the underlying causes of this could provide information on the mechanisms playing a role in the three cancer types. Our results show that mutational processes such as hypermutation activity of POLE or defective DNA damage repair in uterus cancer could be causing the mutations in the ABD domain. For uterus cancer, patients with a PIK3CA mutation have a higher survival rate than those without. In breast cancer we show that there is an association between the ER-positive status of the tumour and having a PIK3CA mutation. Breast cancer is the most diagnosed cancer and characterized by a high heterogeneity. Therefore, improving the stratification of patients is key to tailoring the treatment strategy and to improve the management of this disease. We assess the composition of the tumour microenvironment and demonstrate that its composition is different in PIK3CA mutated breast tumours compared to those without. We also find differences within the group of PIK3CA mutated tumours. For example, tumours with a mutation in the linker ABD-RBD region present an exhausted profile in T cells characterized by a significantly higher expression of LAG3. In conclusion, the analysis of somatic mutations and corresponding protein changes combined with the evaluation of clinical data and the TIME across and within cancer types is useful to stratify cancer patients and identify groups eligible for a specific treatment strategy, such as immunotherapy. [spa] El cáncer está caracterizado por la acumulación de mutaciones en el genoma. El aumento masivo de la secuenciación y el hecho de que se haga pública esta información nos ha permitido estudiar un conjunto de datos formado por tumores primarios y metastáticos de 25,499 donantes con diferentes tipos de cáncer. Después de una visión general del escenario de mutaciones somáticas, nos centramos en cáncer de mama, colon y útero, y describimos el escenario de firmas mutacionales. Clasificamos los tumores según las firmas dominantes e identificamos diferentes perfiles compartidos entre los distintos tipos de tumores, así como diferencias entre tumores primarios y metastáticos. Usando una porción del conjunto de datos, dividimos los tumores en grupos biológicamente relevantes usando 42 estadísticas basadas en todas las mutaciones o solo en las recurrentes. El estudio de las mutaciones recurrentes también revela motivos de secuencia susceptibles, como TT[C>A]TTT en el grupo relacionado con la polimerasa épsilon (POLE). Para ir más allá del escenario genómico, estudiamos los cambios de aminoácido resultantes de las mutaciones somáticas midiendo ocho características relativas a la proteína, como la conservación del aminoácido mutado o el cambio en la energía libre de plegamiento. Centrándonos en el gen PIK3CA, observamos una diferente distribución de las mutaciones a lo largo de los dominios de la proteína en cáncer de mama, colon y útero. Investigamos posibles causas de esta diferencia y encontramos que, en cáncer de útero, procesos mutacionales como la actividad de hipermutación de POLE o la reparación defectuosa de daños en el ADN se relacionan con mutaciones en el dominio ABD. La tasa de supervivencia es más alta en pacientes con cáncer de útero con mutaciones en PIK3CA que sin mutación. Centrados en cáncer de mama, estudiamos la composición celular del microambiente inmune del tumor (TIME) haciendo una descomposición de las muestras de RNA-Seq usando una referencia de célula única. La composición celular del TIME es diferente entre tumores mutados o no mutados en PIK3CA. Del análisis del TIME en los tumores con distintos dominios mutados observamos que los tumores de mama con mutaciones en el ‘linker ABD-RBD’ tienen un perfil de células T exhaustas. En conclusión, el análisis de mutaciones somáticas y correspondientes cambios en las proteínas combinados con datos clínicos y el TIME entre distintos o el mismo tipo de cáncer es útil para estratificar a los pacientes e identificar grupos que puedan ser seleccionados para un determinado tratamiento, como la inmunoterapia.
URI:	https://hdl.handle.net/2445/194743
Appears in Collections:	Tesis Doctorals - Facultat - Química

Files in This Item:

File	Description	Size	Format
ADD_PhD_THESIS.pdf		5.64 MB	Adobe PDF	View/Open

Show full item record