Modelos gráficos probabilísticos en sistemas distribuidos

dc.contributor.advisorCerquides Bueno, Jesús
dc.contributor.advisorPujol González, Marc
dc.contributor.authorBlasco Jiménez, Guillermo
dc.date.accessioned2015-04-28T11:11:56Z
dc.date.available2015-04-28T11:11:56Z
dc.date.issued2015-01-23
dc.descriptionTreballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2015, Director: Jesús Cerquides Bueno i Marc Pujol Gonzálezca
dc.description.abstractModern companies have recognized machine learning techniques as a key instrument to gain competitive edges in their businesses, from better client profiling to optimization and streamlining of their resources and operations. Among the many approaches and problems defined un- der the machine learning umbrella, we focus on Probabilistic Graphical Models (PGM). PGM is a generic framework that allows analysts to harness collected statistics and make predictions using them. Nonetheless, the computation of such predictions is a known NP-hard problem, and hence presents a significant challenge. Therefore, the PGM community has mainly focused on the development of either useful relaxations that can be solved with a lower computational complexity or approximate algorithms that still provide good solutions. Meanwhile, the relentless technological advances of our era have brought us to an interesting situation. On the one hand, the sheer amount of collected data that can be analyzed and exploited is ever-increasing. On the other hand, the advent of cloud computing provides vast amounts of inexpensive computation capabilities. However, exploiting these resources is not an easy endeavor because they are distributed along many networked machines. Fortunately, the community response against these challenges has been the development of generic distributed computation platforms such as Hadoop or Spark. These platforms free the developers from dealing with the distribution challenges, allowing them to focus on the implementation of their algorithms instead. Against this background, the main goal of this project is to build a bridge between these worlds by implementing a core PGM algorithm in one of such distributed systems. The selected algorithm is Belief Propagation (BP), an approximate inference algorithm that has good complexity characteristics and has proven to provide good solutions in practice. Additionally, BP is based on a message-passing model, and hence stands as a perfect candidate for a distributed implementation. Furthermore, the implementation is done under Apache Spark, because it is designed specifically to support message-passing algorithms. Nonetheless, despite the invaluable foundation provided by the platform, Spark leaves many considerations left to the criteria of the developer. Therefore, in this project we present the challenges that arose during such implementation, and how they can be overcome. The end result of this project is hence a BP implementation that can run in a distributed manner between as many systems as supported by Spark. Moreover, the implementation is both unit-tested and checked against a centralized open-source library. This opens up the possibility for anyone to make predictions based on large PGM models built spanning gigabytes of information.ca
dc.format.extent47 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/65252
dc.language.isospaca
dc.rightscc-by-nc-nd (c) Guillermo Blasco Jiménez, 2015
dc.rightscodi: BSD License Copyright (c) 2015, University of Barcelona
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es
dc.rights.urihttp://opensource.org/licenses/BSD-3-Clause
dc.sourceTreballs Finals de Grau (TFG) - Matemàtiques
dc.subject.classificationEstadística bayesiana
dc.subject.classificationTreballs de fi de grau
dc.subject.classificationAlgorismes computacionalsca
dc.subject.classificationComputació distribuïdaca
dc.subject.classificationAprenentatge automàticca
dc.subject.otherBayesian statistical decision
dc.subject.otherBachelor's theses
dc.subject.otherComputer algorithmseng
dc.subject.otherComputational grids (Computer systems)eng
dc.subject.otherMachine learningeng
dc.titleModelos gráficos probabilísticos en sistemas distribuidosca
dc.typeinfo:eu-repo/semantics/bachelorThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
codi_font.zip
Mida:
445.51 KB
Format:
ZIP file
Descripció:
Codi font
Carregant...
Miniatura
Nom:
memoria.pdf
Mida:
746.84 KB
Format:
Adobe Portable Document Format
Descripció:
Memòria