Modelos gráficos probabilísticos en sistemas distribuidos

Blasco Jiménez, Guillermo

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/65252

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Cerquides Bueno, Jesús	-
dc.contributor.advisor	Pujol González, Marc	-
dc.contributor.author	Blasco Jiménez, Guillermo	-
dc.date.accessioned	2015-04-28T11:11:56Z	-
dc.date.available	2015-04-28T11:11:56Z	-
dc.date.issued	2015-01-23	-
dc.identifier.uri	http://hdl.handle.net/2445/65252	-
dc.description	Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2015, Director: Jesús Cerquides Bueno i Marc Pujol González	ca
dc.description.abstract	Modern companies have recognized machine learning techniques as a key instrument to gain competitive edges in their businesses, from better client profiling to optimization and streamlining of their resources and operations. Among the many approaches and problems defined un- der the machine learning umbrella, we focus on Probabilistic Graphical Models (PGM). PGM is a generic framework that allows analysts to harness collected statistics and make predictions using them. Nonetheless, the computation of such predictions is a known NP-hard problem, and hence presents a significant challenge. Therefore, the PGM community has mainly focused on the development of either useful relaxations that can be solved with a lower computational complexity or approximate algorithms that still provide good solutions. Meanwhile, the relentless technological advances of our era have brought us to an interesting situation. On the one hand, the sheer amount of collected data that can be analyzed and exploited is ever-increasing. On the other hand, the advent of cloud computing provides vast amounts of inexpensive computation capabilities. However, exploiting these resources is not an easy endeavor because they are distributed along many networked machines. Fortunately, the community response against these challenges has been the development of generic distributed computation platforms such as Hadoop or Spark. These platforms free the developers from dealing with the distribution challenges, allowing them to focus on the implementation of their algorithms instead. Against this background, the main goal of this project is to build a bridge between these worlds by implementing a core PGM algorithm in one of such distributed systems. The selected algorithm is Belief Propagation (BP), an approximate inference algorithm that has good complexity characteristics and has proven to provide good solutions in practice. Additionally, BP is based on a message-passing model, and hence stands as a perfect candidate for a distributed implementation. Furthermore, the implementation is done under Apache Spark, because it is designed specifically to support message-passing algorithms. Nonetheless, despite the invaluable foundation provided by the platform, Spark leaves many considerations left to the criteria of the developer. Therefore, in this project we present the challenges that arose during such implementation, and how they can be overcome. The end result of this project is hence a BP implementation that can run in a distributed manner between as many systems as supported by Spark. Moreover, the implementation is both unit-tested and checked against a centralized open-source library. This opens up the possibility for anyone to make predictions based on large PGM models built spanning gigabytes of information.	ca
dc.format.extent	47 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	spa	ca
dc.rights	cc-by-nc-nd (c) Guillermo Blasco Jiménez, 2015	-
dc.rights	codi: BSD License Copyright (c) 2015, University of Barcelona	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es	-
dc.rights.uri	http://opensource.org/licenses/BSD-3-Clause	-
dc.source	Treballs Finals de Grau (TFG) - Matemàtiques	-
dc.subject.classification	Estadística bayesiana	-
dc.subject.classification	Treballs de fi de grau	-
dc.subject.classification	Algorismes computacionals	ca
dc.subject.classification	Computació distribuïda	ca
dc.subject.classification	Aprenentatge automàtic	ca
dc.subject.other	Bayesian statistical decision	-
dc.subject.other	Bachelor's theses	-
dc.subject.other	Computer algorithms	eng
dc.subject.other	Computational grids (Computer systems)	eng
dc.subject.other	Machine learning	eng
dc.title	Modelos gráficos probabilísticos en sistemas distribuidos	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
Appears in Collections:	Treballs Finals de Grau (TFG) - Matemàtiques

Files in This Item:

File	Description	Size	Format
codi_font.zip	Codi font	445.51 kB	zip	View/Open
memoria.pdf	Memòria	746.84 kB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License