Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/179099
Title: Detection and classification of somatic structural variants, and its application in the study of neuronal development
Author: Planas Fèlix, Mercè
Director/Tutor: Torrents Arenales, David
Keywords: Càncer
Genètica humana
Bioinformàtica
Genòmica
Cancer
Human genetics
Bioinformatics
Genomics
Issue Date: 5-Oct-2020
Publisher: Universitat de Barcelona
Abstract: [eng] The identification and analysis of genomic variation across individuals has been central in biology, first through comparative genomics to answer evolutionary questions, and then in the context of biomedicine, where it is actually becoming central to the study of most diseases. Next generation sequence technologies are allowing the systematic analysis of thousands of different types of genetic variation, enhancing the identification of disease markers and the understanding of the molecular basis of disease. For the past years, there has been a burst of new methodology for genome analysis around diseases coming from hundreds of groups around the world. Specific computational methods and strategies are being designed and improved around the identification and interpretation of genomic variation. The identification and classification of different types of genomic variants in the context of biomedicine is a key and foundational step for the development of a personalized medicine. This has been particularly central in the field of cancer genomics, which has based the research of the past ten to fifteen years in the sequencing of genomic DNA, and the identification and interpretation of (mostly) somatic and germline variation. Throughout these years, a large number of methods for variant detection have been developed with different action ranges. Despite all these developments, the identification of genomic variants has still room for improvement, not only at the level of sensitivity and specificity, but also at the computational level. Given the emergence of many initiatives for personalized medicine around the world, and the expected number of genomes that will have to be analyzed within health care systems, we require robust algorithms, designed together with a matching implementation that will minimize the computational costs of the analysis. With this aim, during this thesis, I have pushed and designed and implemented an algorithm for the efficient processing of genomic data, in close collaboration with computer scientists of our center that defined the implementation, focusing on lowering the energy and the time of the analysis. This methodology, which relies on a reference free approach of read classification, has been protected with a patent, and is being used as the foundation for the development of SMuFin2, a more accurate and computationally efficient version of the initial SMuFin from 2014. We here show that our method is able to process whole genome sequences very fast and with a minimal energy consumption, compared with existing methods, and that has great potential for the identification of all ranges of variants, including insertions of non-human DNA. Further developments on SMuFin2 are needed to finally assess its full variant calling capabilities. Despite their great importance and their clear role in the biology of the cell, somatic variation that occurs in healthy tissues has remained diffuse in their roles. In the case of development, some hypotheses have been proposed to explain the observed somatic DNA damage that occurs during brain development (e.g., replication stress). But the real impact and the underlying mechanisms of this somatic variation are not yet understood. In order to seed light on the type and potential functional impact of somatic variation in brain development, we established a new collaboration to identify, and describe somatic DNA rearrangements induced by Pgbd5 during brain development and adult state in 36 mice neural tissue samples. The detection of somatic variants in healthy tissues presents more challenges than in the cancer scenario, where a variant is present in a significant number of cells and is easier to detect. We have identified, classified and interpreted the landscape of somatic variation in neural development and identified interesting differences between adult and embryonic variation load, and specific types of variants, as the potential result of the activity of these transposase-like genes.
URI: http://hdl.handle.net/2445/179099
Appears in Collections:Tesis Doctorals - Facultat - Biologia

Files in This Item:
File Description SizeFormat 
MPF_PhD_THESIS.pdf54 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.