Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/65252
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorCerquides Bueno, Jesús-
dc.contributor.advisorPujol González, Marc-
dc.contributor.authorBlasco Jiménez, Guillermo-
dc.date.accessioned2015-04-28T11:11:56Z-
dc.date.available2015-04-28T11:11:56Z-
dc.date.issued2015-01-23-
dc.identifier.urihttp://hdl.handle.net/2445/65252-
dc.descriptionTreballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2015, Director: Jesús Cerquides Bueno i Marc Pujol Gonzálezca
dc.description.abstractModern companies have recognized machine learning techniques as a key instrument to gain competitive edges in their businesses, from better client profiling to optimization and streamlining of their resources and operations. Among the many approaches and problems defined un- der the machine learning umbrella, we focus on Probabilistic Graphical Models (PGM). PGM is a generic framework that allows analysts to harness collected statistics and make predictions using them. Nonetheless, the computation of such predictions is a known NP-hard problem, and hence presents a significant challenge. Therefore, the PGM community has mainly focused on the development of either useful relaxations that can be solved with a lower computational complexity or approximate algorithms that still provide good solutions. Meanwhile, the relentless technological advances of our era have brought us to an interesting situation. On the one hand, the sheer amount of collected data that can be analyzed and exploited is ever-increasing. On the other hand, the advent of cloud computing provides vast amounts of inexpensive computation capabilities. However, exploiting these resources is not an easy endeavor because they are distributed along many networked machines. Fortunately, the community response against these challenges has been the development of generic distributed computation platforms such as Hadoop or Spark. These platforms free the developers from dealing with the distribution challenges, allowing them to focus on the implementation of their algorithms instead. Against this background, the main goal of this project is to build a bridge between these worlds by implementing a core PGM algorithm in one of such distributed systems. The selected algorithm is Belief Propagation (BP), an approximate inference algorithm that has good complexity characteristics and has proven to provide good solutions in practice. Additionally, BP is based on a message-passing model, and hence stands as a perfect candidate for a distributed implementation. Furthermore, the implementation is done under Apache Spark, because it is designed specifically to support message-passing algorithms. Nonetheless, despite the invaluable foundation provided by the platform, Spark leaves many considerations left to the criteria of the developer. Therefore, in this project we present the challenges that arose during such implementation, and how they can be overcome. The end result of this project is hence a BP implementation that can run in a distributed manner between as many systems as supported by Spark. Moreover, the implementation is both unit-tested and checked against a centralized open-source library. This opens up the possibility for anyone to make predictions based on large PGM models built spanning gigabytes of information.ca
dc.format.extent47 p.-
dc.format.mimetypeapplication/pdf-
dc.language.isospaca
dc.rightscc-by-nc-nd (c) Guillermo Blasco Jiménez, 2015-
dc.rightscodi: BSD License Copyright (c) 2015, University of Barcelona-
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es-
dc.rights.urihttp://opensource.org/licenses/BSD-3-Clause-
dc.sourceTreballs Finals de Grau (TFG) - Matemàtiques-
dc.subject.classificationEstadística bayesiana-
dc.subject.classificationTreballs de fi de grau-
dc.subject.classificationAlgorismes computacionalsca
dc.subject.classificationComputació distribuïdaca
dc.subject.classificationAprenentatge automàticca
dc.subject.otherBayesian statistical decision-
dc.subject.otherBachelor's theses-
dc.subject.otherComputer algorithmseng
dc.subject.otherComputational grids (Computer systems)eng
dc.subject.otherMachine learningeng
dc.titleModelos gráficos probabilísticos en sistemas distribuidosca
dc.typeinfo:eu-repo/semantics/bachelorThesisca
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
Appears in Collections:Treballs Finals de Grau (TFG) - Matemàtiques

Files in This Item:
File Description SizeFormat 
codi_font.zipCodi font445.51 kBzipView/Open
memoria.pdfMemòria746.84 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons