Methods and Models for the Analysis of Biological Signifïcance Based on High­Throughput Data

Mosquera Mayo, José Luís

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/63785

Title:	Methods and Models for the Analysis of Biological Signifïcance Based on HighThroughput Data
Author:	Mosquera Mayo, José Luís
Director/Tutor:	Sànchez, Àlex (Sànchez Pla) Oller i Sala, Josep Maria
Keywords:	Genòmica Marcadors bioquímics Semàntica Ontologies (Informàtica) Genomics Biochemical markers Semantics Ontologies (Information retrieval)
Issue Date:	12-Dec-2014
Publisher:	Universitat de Barcelona
Abstract:	[cat]L'aparició de les tecnologies d'alt rendiment ha generat una quantitat ingent de dades òmiques. Els resultats d'aquests experiment són llargues llistes de gens, que poden ser utilitzats com a biomarcadors. Un dels grans reptes dels investigadors experimentals és atribuir una interpretació o significació biològica a aquests biomarcadors potencials, ja be sigui extraient la informació bioblògica emmagatzemada en recursos com la Gene Ontology (GO) o la Kyoto Encyclopedia of Genes and Genomes (KEGG), o be combinant-les amb altres dades òmiques. Els objectius de la tesis eren: primer, estudiar les propietats matemàtiques de dos tipus de mesures de similaritat semàntica per a explorar categories GO, i segon, classificar i estudiar l'evolució de les eines GO per a l'anàlisi d'enriquiment. La primera mesura de similaritat semàntica considerada, proposada per en Lord et al., es fonamentava en la teoria de grafs, i la segona era un grup de pseudo-distàncies, proposades per Joslyn et al., fonamentades en la teoria dels Partially Ordered Sets (POSETs). L'estudi de les eines GO es va basar en les primeres 26 eines disponibles al web del The GO Consortium. S'ha vist que la mesura d'en Lord et al. és la mateixa mesura que la d'en Resnik, anteriorment publicada. S'ha observat una analogia en la forma de mapejar els gens a la GO via grafs i/o via POSETs. S'han proposat una propietat i un corol·lari que permeten calcular matricialment les la primera mesura de similaritat semàntica. S'ha demostrat que ambdues mesures estan associades a la distància mètrica. A'ha desenvolupat un paquet R, anomenat sims, que permet calcular similaritats semàntiques d'una ontologia arbitraria i comparar perfils de similaritat semàntica de la GO. S'ha proposat un Conjunt de Funcionalitats Estàndard per a classificar eines GO i s'ha desenvolupat un programari web, anomenat SerbGO, dirigit a seleccionar i comparar eines GO. L'estudi estadístic ha revelat que els promotors de les eines GO han introduït millores al llarg del temps, però no s'han detectat models ben definits. S'ha desenvolupat una ontologia, anomenada DeGOT, que proporciona un vocabulari als desenvolupadors per a introduir millores a les eines o dissenyar una de nova. [eng] Cerca avançada Restringir a TDX Inici \| Què és? \| Preguntes més freqüents (FAQ) \| Contacte English \| Castellano Consultar TDX Per universitats i departaments Per data de defensa Per autors/directors Per títols Per matèries Consultar departament Per data de defensa Per autors/directors Per títols Per matèries Estadístiques Per tesi Per departament Per universitat Tot TDX El meu TDX Registrat com tdx@ub.edu (Finalitza la sessió) Perfil Enviaments Alertes per correu-e Opcions administrador Edita aquest element Altres portals de tesis Tesis europees Tesis internacionals Novetats Pàgina inicial del TDX > Universitat de Barcelona > Departament d'Estadística > Visualitza tesi Logotip de la col·lecció Empreu aquest identificador per citar o enllaçar aquesta tesi: http://hdl.handle.net/10803/286465 Títol: Methods and Models for the Analysis of Biological Signifïcance Based on HighThroughput Data Autor/a: Mosquera Mayo, José Luís Director/a: Sànchez, Àlex (Sànchez Pla) Tutor/a: Oller i Sala, Josep Maria Departament/Institut: Universitat de Barcelona. Departament d'Estadística Abstract: The advent of high-throughput technologies has generated a huge quantity of omics data. The results of these experiments usually are long lists of genes that can be used as biomarkers. A major challenge for the researchers is to attribute a biological interpretation or significance to these list of potential biomarkers, by using biological information stored in bioinformatics resources such as the Gene Ontology (GO) or the Kyoto Encyclopedia of Genes and Genomes (KEGG), or combining them with other types of omics data. This dissertation had two main objectives. First, to study mathematical properties of two types of semantic similarity measures for exploring GO categories, and second, to classify and to study the evolution of GO tools for enrichment analysis. The first measure considered was a semantic similarity measure proposed by Lord et al. It is a node- based approach based on the Graph Theory. The second measure actually was a group pseudo- distances proposed Joslyn et al. They were edge-based approaches based on the algebraic point of view of the Partially Ordered Sets (POSET) Theory. So, in order of reaching our objectives, first of all a review and description of main methods about graph theory and POSET theory was carried out. This fact allowed us to realized that there are to ways for mapping objects (e.g. genes) in to the terms of an ontology (e.g. GO). First formulation is called Object-Ontology Complex (OOC). It was proposed by Carey in order to perform statistical computations. Second formulation is called POSET Ontology (POSO) and it was introduced by Joslyn et al. In order to classify the GO for enrichment analysis the first 26 GO available at the website of The GO Consortium were surveyed. These left us list of 205 features that were used for building an Standard Functionalities Set. Based on these functionalities the 26 GO tools were classified according to their capabilities. The study of the GO tools evolution was based on the monitoring of these 26 GO tools. So the statistical analysis consisted of a descriptive statistics, an inferential analysis and a multivariate analysis. With regard to the first objective, we have seen the Lord's measure is the same as the Resnik's measure, previously published. It has observed that there exists a certain level of analogy between the formalization of the OOC and the POSO for mapping genes to objects to terms of an ontology. A property and a corollary to calculate semantic similarity measures from node-based approaches based on a matrix point of view have been proposed. It has been proved that the Lord's measure and the Joslyn's measure can be redefined in terms of metric distance. An R package called sims for computing semantic similarity measures between terms of an arbitrary ontology and comparing semantic similarity profiles based on the GO terms associated with two lists of genes has been developed. Based on the classification of the GO programs a web-based tool called SerbGO devoted to select and compare GO tools stored in was developed. The statistical analysis about the evolution of GO tools suggested that the promoters have introduced improvements over time, but clear models of GO tools have been detected. According to the results of the statistical analysis an ontology called DeGOT was built in order to provide an structured vocabulary for the developers when they dealing with the task of introducing improvements in the existing GO tools for enrichment analysis or designing a new one program. DeGOT can be used for supporting queries and comparison results of SerbGO.
URI:	http://hdl.handle.net/2445/63785
Appears in Collections:	Tesis Doctorals - Departament - Estadística

Files in This Item:

File	Description	Size	Format
JLMM_PhD_THESIS.pdf		10.38 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License