Formalizing the Problem of Learning with Imprecise Data

Deep learning is a powerful tool for complex tasks including image classification, but its success heavily depends on the availability of high-quality, correctly labeled data. In practice, however, datasets often contain imprecise labels—annotations that are ambiguous, incomplete, or incorrect. This thesis addresses the central challenge of how to build reliable learning systems when the data they depend on cannot be fully trusted. At first, the thesis provides a rigorous mathematical formalization of the core concepts in machine learning, with a particular emphasis on deep learning frameworks. Building on this foundation, it then introduces and studies a specialized framework that models the learning process under imprecise labels, where the assumptions of standard supervised learning no longer hold. Through the lens of statistical modeling, we explore how uncertainty in labels can be incorporated into deep learning models, treating imprecision not as noise to ignore, but as a structure to model. A key contribution is showing how such a framework defines a parametric model amenable to inference techniques like Maximum Likelihood Estimation (MLE). The practical component of the thesis involves implementing the framework using real image datasets, with experiments designed to study how imprecise labels influence the learning process of deep networks. These results help identify strategies for mitigating the negative effects of label noise and contribute to building more robust and theoretically grounded learning systems. By bridging the gap between theoretical foundations and practical implementations, this work aims to deepen the understanding of learning under imprecision, which is critical for deploying deep learning models in real-world applications. The insights gained have broader implications beyond image classification, potentially benefiting various domains and tasks where data quality is a concern. Ultimately, this thesis seeks to pave the way for more reliable and interpretable machine learning models capable of handling the complexities of imperfect data.

Descripció

Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2025, Director: Bhalaji Nagarajan, Petia Radeva i Àlex Haro

Matèries

Aprenentatge automàtic, Aprenentatge profund, Intel·ligència artificial, Ana Díaz Acevedo, Treballs de fi de grau

Matèries (anglès)

Machine learning, Deep learning (Machine learning), Artificial intelligence, Bachelor's theses

Col·leccions

Treballs Finals de Grau (TFG) - Matemàtiques

Pàgina completa de l'ítem

Citació

DÍAZ ACEVEDO, Ana. Formalizing the Problem of Learning with Imprecise Data. [consulted: 3 of July of 2026]. Available at: https://hdl.handle.net/2445/227311

Estadístiques

Exportar metadades

JSON - METS

Fitxers

Tipus de document

Data de publicació

Llicència de publicació

Formalizing the Problem of Learning with Imprecise Data

Títol de la revista

Autors

Director/Tutor

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

Descripció

Matèries

Matèries (anglès)

Citació

Col·leccions

Citació

Exportar metadades

Fitxers

Tipus de document

Data de publicació

Llicència de publicació

Formalizing the Problem of Learning with Imprecise Data

Títol de la revista

Autors

Director/Tutor

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

Descripció

Matèries

Matèries (anglès)

Citació

Col·leccions

Citació

Exportar metadades

Compartir registre