Diversity metrics in deep learning embeddings

Prol Prieto, Pablo

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/220385

Title:	Diversity metrics in deep learning embeddings
Author:	Prol Prieto, Pablo
Director/Tutor:	Statuto, Nahuel
Keywords:	Aprenentatge automàtic Matemàtica discreta Xarxes neuronals (Informàtica) Treballs de fi de grau Machine learning Discrete mathematics Neural networks (Computer science) Bachelor's theses
Issue Date:	9-Jun-2024
Abstract:	This project embarks on an in-depth exploration of diversity metrics within the realm of deep learning embeddings, under the mentorship of Nahuel Statuto and Santiago Seguí. Our work begins with an introductory overview of our objectives, which were to study the possibility of using a neural network which has been trained for a task to perform well enough in a different task for which it has not been trained, and to approach this study from the perspective of diversity in neural network embeddings. Finally delving into machine learning and the role of diversity in this field. We also lay the foundational knowledge of deep learning, covering the architecture and training of feedforward neural networks. This includes a detailed examination of artificial neurons, their organization into layers, the incorporation of activation functions to model nonlinear relationships, and a deep explanation of the training process, including loss functions, the feedforward process, gradient descent, and backpropagation, explaining how these elements minimize the loss and improve the model’s performance. We then shift the focus to the concept of diversity, first in machine learning, highlighting the limitations of existing metrics and introducing the Vendi Score as a promising alternative. Then, delving into the domain of ecology, we discuss the treatment of diversity in this field, presenting key metrics and a set of properties aligned with the intuition for diversity and considered fundamental by ecologists for diversity metrics that account for species similarity. Following this, we define the Hill numbers, an ecological diversity metric that serves as the foundation for the Vendi Score and despite being a powerful metric, its main drawback is that it does not take into account similarity between species. After addressing the Hill numbers’ limitations, we present the Vendi Score as the extension of the Hill number of order 1, alongside necessary lemmas to demonstrate its adherence to desired properties. Recognizing the Vendi Score’s susceptibility to imbalanced data classes, we introduce its extensions, the Cousins of the Vendi Score, which mitigate this issue by extending the Vendi Score to the rest of the Hill numbers, while acknowledging remaining challenges. The concluding chapter presents empirical studies conducted to study our objectives in a practical setting. Our findings reveal no explicit correlation, other than the diversity of embeddings being directly related to a classifier’s number of classes. We conclude that the potential for repurposing a neural network to excel in an untrained task is minimal, highlighting how the relationship between diversity metrics and the effectiveness of deep learning models is more complex than it seems.
Note:	Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Nahuel Statuto
URI:	https://hdl.handle.net/2445/220385
Appears in Collections:	Treballs Finals de Grau (TFG) - Matemàtiques

Files in This Item:

File	Description	Size	Format
tfg_prol_prieto_pablo.pdf	Memòria	4.71 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License