Please use this identifier to cite or link to this item:
http://hdl.handle.net/2445/65252
Title: | Modelos gráficos probabilísticos en sistemas distribuidos |
Author: | Blasco Jiménez, Guillermo |
Director/Tutor: | Cerquides Bueno, Jesús Pujol González, Marc |
Keywords: | Estadística bayesiana Treballs de fi de grau Algorismes computacionals Computació distribuïda Aprenentatge automàtic Bayesian statistical decision Bachelor's theses Computer algorithms Computational grids (Computer systems) Machine learning |
Issue Date: | 23-Jan-2015 |
Abstract: | Modern companies have recognized machine learning techniques as a key instrument to gain competitive edges in their businesses, from better client profiling to optimization and streamlining of their resources and operations. Among the many approaches and problems defined un- der the machine learning umbrella, we focus on Probabilistic Graphical Models (PGM). PGM is a generic framework that allows analysts to harness collected statistics and make predictions using them. Nonetheless, the computation of such predictions is a known NP-hard problem, and hence presents a significant challenge. Therefore, the PGM community has mainly focused on the development of either useful relaxations that can be solved with a lower computational complexity or approximate algorithms that still provide good solutions. Meanwhile, the relentless technological advances of our era have brought us to an interesting situation. On the one hand, the sheer amount of collected data that can be analyzed and exploited is ever-increasing. On the other hand, the advent of cloud computing provides vast amounts of inexpensive computation capabilities. However, exploiting these resources is not an easy endeavor because they are distributed along many networked machines. Fortunately, the community response against these challenges has been the development of generic distributed computation platforms such as Hadoop or Spark. These platforms free the developers from dealing with the distribution challenges, allowing them to focus on the implementation of their algorithms instead. Against this background, the main goal of this project is to build a bridge between these worlds by implementing a core PGM algorithm in one of such distributed systems. The selected algorithm is Belief Propagation (BP), an approximate inference algorithm that has good complexity characteristics and has proven to provide good solutions in practice. Additionally, BP is based on a message-passing model, and hence stands as a perfect candidate for a distributed implementation. Furthermore, the implementation is done under Apache Spark, because it is designed specifically to support message-passing algorithms. Nonetheless, despite the invaluable foundation provided by the platform, Spark leaves many considerations left to the criteria of the developer. Therefore, in this project we present the challenges that arose during such implementation, and how they can be overcome. The end result of this project is hence a BP implementation that can run in a distributed manner between as many systems as supported by Spark. Moreover, the implementation is both unit-tested and checked against a centralized open-source library. This opens up the possibility for anyone to make predictions based on large PGM models built spanning gigabytes of information. |
Note: | Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2015, Director: Jesús Cerquides Bueno i Marc Pujol González |
URI: | http://hdl.handle.net/2445/65252 |
Appears in Collections: | Treballs Finals de Grau (TFG) - Matemàtiques |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
codi_font.zip | Codi font | 445.51 kB | zip | View/Open |
memoria.pdf | Memòria | 746.84 kB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License