Parametric learning of probabilistic graphical models from multi-sourced data

dc.contributor.advisorHernández-González, Jerónimo
dc.contributor.advisorPérez Martínez, Aritz
dc.contributor.authorCatalán Cerezo, David
dc.date.accessioned2024-06-18T09:19:46Z
dc.date.available2024-06-18T09:19:46Z
dc.date.issued2023-06-30
dc.descriptionTreballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2022-2023. Tutor: Jerónimo Hernández-González i Aritz Pérez Martínezca
dc.description.abstractIn Machine Learning, it is common to encounter scenarios where learning a model from a scarce dataset may not be feasible. In these cases, data from multiple different sources have to be collected. When data from multiple sources is distributed differently, the benefit of a bigger sample size trades off with the difficulty to model together data sampled from different distributions. A similar framework is presented in fairness analysis, where subpopulations defined by the protected attributes might show different underlying distributios. In this work, we study the use of hierarchical Bayesian methods to learn Bayesian network (BN) models from all the available data while being aware of the presence of unequally distributed data sources. We propose a variation of a previous hierarchical Bayesian approach for learning BN parameters which naturally accommodates into the framework of BNs. The comparison with the state-of-the-art methods is done in two dimensions: the amount of samples available to train a model, and the divergence of the underlying distribution of the different data sources. Experimental results suggest that our model is competitive when data is scarce and the multiple sources are distributed differently.ca
dc.format.extent42 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/213325
dc.language.isoengca
dc.rightscc-by-nc-nd (c) David Catalán Cerezo, 2023
dc.rightscodi: GPL (c) David Catalán Cerezo, 2023
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.rights.urihttp://www.gnu.org/licenses/gpl-3.0.ca.html*
dc.sourceMàster Oficial - Fonaments de la Ciència de Dades
dc.subject.classificationAprenentatge automàtic
dc.subject.classificationEstadística bayesiana
dc.subject.classificationProcessament de dades
dc.subject.classificationTreballs de fi de màster
dc.subject.otherMachine learning
dc.subject.otherBayesian statistical decision
dc.subject.otherData processing
dc.subject.otherMaster's thesis
dc.titleParametric learning of probabilistic graphical models from multi-sourced dataca
dc.typeinfo:eu-repo/semantics/masterThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
tfg_catalan_cerezo_david.pdf
Mida:
1.38 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
Codi_font.zip
Mida:
6.61 MB
Format:
ZIP file
Descripció:
Codi font