Differentially Private Machine Learning: Implementation and Analysis of Gradient and Dataset Perturbation Techniques

Mantilla Carreño, Juan Pablo

Differentially Private Machine Learning: Implementation and Analysis of Gradient and Dataset Perturbation Techniques

dc.contributor.advisor	Statuto, Nahuel
dc.contributor.author	Mantilla Carreño, Juan Pablo
dc.date.accessioned	2025-10-28T10:38:00Z
dc.date.available	2025-10-28T10:38:00Z
dc.date.issued	2025-06-10
dc.description	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2025, Director: Nahuel Statuto	ca
dc.description.abstract	The increasing use of machine learning poses significant privacy risks, especially when sensitive data is used, and conventional anonymization methods have proven insufficient. Differential privacy is a rigorous framework for data privacy providing strong mathematical guarantees. The possibility of applying this framework to machine learning solves the privacy problem. We will present the fundamental basis of these concepts to empirically investigate, implement, and analyse two techniques for integrating differential privacy into machine learning pipelines. The first technique, dataset perturbation, involves adding calibrated Gaussian noise directly to the training data and then using any standard machine learning pipeline. The second, gradient perturbation, centers on differentially private stochastic gradient descent, is an approach that injects noise into the gradients during the training phase. For the comparative study, we developed a multi-class classification architecture using a real-world, sensitive medical dataset derived from the MIMIC-IV database. Model performance was evaluated against a non-private baseline, using the appropriate metrics considering our class imbalance, such as Macro F1-score and Macro OVO AUC. The results confirm the trade-off between privacy and utility in the models developed, where higher privacy guarantees consistently result in reduced model utility. For the specific context of this study, gradient perturbation provided a slightly more advantageous model in overall balance of utility and privacy. Ultimately, the thesis provides strong evidence for the feasibility of training useful and formally private machine learning models on real-world medical data, successfully demonstrating a practical "sweet spot" between privacy and performance can be found.	ca
dc.format.extent	49 p.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/2445/223909
dc.language.iso	eng	ca
dc.rights	memòria: cc-nc-nd (c) Juan Pablo Mantilla Carreño, 2025
dc.rights	codi: GPL (c) Juan Pablo Mantilla Carreño, 2025
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.uri	http://www.gnu.org/licenses/gpl-3.0.ca.html	*
dc.source	Treballs Finals de Grau (TFG) - Enginyeria Informàtica
dc.subject.classification	Aprenentatge automàtic	ca
dc.subject.classification	Protecció de dades	ca
dc.subject.classification	Dades massives	ca
dc.subject.classification	Programari	ca
dc.subject.classification	Treballs de fi de grau	ca
dc.subject.classification	Processos gaussians	ca
dc.subject.other	Machine learning	en
dc.subject.other	Data protection	en
dc.subject.other	Big data	en
dc.subject.other	Computer software	en
dc.subject.other	Bachelor's theses	en
dc.subject.other	Gaussian processes	en
dc.title	Differentially Private Machine Learning: Implementation and Analysis of Gradient and Dataset Perturbation Techniques	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca

Fitxers

Paquet original

Mostrant 1 - 2 de 2

Nom:: tfg_Mantilla_Carreño_Juan_Pablo.pdf
Mida:: 1.34 MB
Format:: Adobe Portable Document Format
Descripció:: Memòria

Descarregar

Nom:: codi.zip
Mida:: 38.56 KB
Format:: ZIP file
Descripció:: Codi font

Descarregar

Col·leccions

Treballs Finals de Grau (TFG) - Enginyeria Informàtica
Programari - Treballs de l'alumnat
Treballs Finals de Grau (TFG) - Matemàtiques

El Dipòsit Digital ha actualitzat el programari. Contacteu amb dipositdigital@ub.edu per informar de qualsevol incidència.

Differentially Private Machine Learning: Implementation and Analysis of Gradient and Dataset Perturbation Techniques

Fitxers

Paquet original

Col·leccions