Parametric learning of probabilistic graphical models from multi-sourced data

Catalán Cerezo, David

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/213325

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Hernández-González, Jerónimo	-
dc.contributor.advisor	Pérez Martínez, Aritz	-
dc.contributor.author	Catalán Cerezo, David	-
dc.date.accessioned	2024-06-18T09:19:46Z	-
dc.date.available	2024-06-18T09:19:46Z	-
dc.date.issued	2023-06-30	-
dc.identifier.uri	https://hdl.handle.net/2445/213325	-
dc.description	Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2022-2023. Tutor: Jerónimo Hernández-González i Aritz Pérez Martínez	ca
dc.description.abstract	In Machine Learning, it is common to encounter scenarios where learning a model from a scarce dataset may not be feasible. In these cases, data from multiple different sources have to be collected. When data from multiple sources is distributed differently, the benefit of a bigger sample size trades off with the difficulty to model together data sampled from different distributions. A similar framework is presented in fairness analysis, where subpopulations defined by the protected attributes might show different underlying distributios. In this work, we study the use of hierarchical Bayesian methods to learn Bayesian network (BN) models from all the available data while being aware of the presence of unequally distributed data sources. We propose a variation of a previous hierarchical Bayesian approach for learning BN parameters which naturally accommodates into the framework of BNs. The comparison with the state-of-the-art methods is done in two dimensions: the amount of samples available to train a model, and the divergence of the underlying distribution of the different data sources. Experimental results suggest that our model is competitive when data is scarce and the multiple sources are distributed differently.	ca
dc.format.extent	42 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	ca
dc.rights	cc-by-nc-nd (c) David Catalán Cerezo, 2023	-
dc.rights	codi: GPL (c) David Catalán Cerezo, 2023	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.rights.uri	http://www.gnu.org/licenses/gpl-3.0.ca.html	*
dc.source	Màster Oficial - Fonaments de la Ciència de Dades	-
dc.subject.classification	Aprenentatge automàtic	-
dc.subject.classification	Estadística bayesiana	-
dc.subject.classification	Processament de dades	-
dc.subject.classification	Treballs de fi de màster	-
dc.subject.other	Machine learning	-
dc.subject.other	Bayesian statistical decision	-
dc.subject.other	Data processing	-
dc.subject.other	Master's thesis	-
dc.title	Parametric learning of probabilistic graphical models from multi-sourced data	ca
dc.type	info:eu-repo/semantics/masterThesis	ca
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
Appears in Collections:	Programari - Treballs de l'alumnat Màster Oficial - Fonaments de la Ciència de Dades

Files in This Item:

File	Description	Size	Format
tfg_catalan_cerezo_david.pdf	Memòria	1.41 MB	Adobe PDF	View/Open
Codi_font.zip	Codi font	6.77 MB	zip	View/Open

Show simple item record

This item is licensed under a Creative Commons License