Efficient transformers applied to video classification

Martínez Pérez, Oriol

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/203127

Title:	Efficient transformers applied to video classification
Author:	Martínez Pérez, Oriol
Director/Tutor:	Escalera Guerrero, Sergio Clapés i Sintes, Albert Pujol, David
Keywords:	Visió per ordinador Aprenentatge automàtic Tractament del llenguatge natural (Informàtica) Treballs de fi de grau Computer vision Machine learning Natural language processing (Computer science) Bachelor's theses
Issue Date:	12-Jun-2023
Abstract:	[en] Transformers, with the self-attention mechanism on its core, have shown great performance on several Machine Learning areas such as NLP or Computer Vision since its appearance at 2017 [1]. However, its quadratic time and memory complexity on the input length makes its application prohibitive when dealing with large input sequences. This motivated the appearance of several self-attention reformulations in order to lower its complexity and make its development less costly. We focus on three of these self-attention mechanisms applied to video classification: Cosformer [2], Nyströmformer [3] and Linformer [4]. Concretely, our goal in this project is to suggest which of them is best suited for this task. To evaluate each model performance, we design a personalizable Transformer with interchangeable self attention mechanisms and train it using a simplified dataset derived from EpicKitchens-100 [5]. We carefully describe the Transformer architecture, explaining the purpose of each of its modules, and provide and overall description of how internally works. Preliminary results indicate that Nyströmformer is the best option, being the model which converged faster and achieved the best trade off between computational cost and classification metrics. Linformer obtained similar results and Cosformer apparently failed to perform the classification. The theoretical formalization of the aforementioned self-attention mechanisms is essential for their results interpretation. Hence, we also provide an in-depth mathematical description of both the original self-attention mechanism presented by Vaswani [1] and the three efficient mechanisms. We realize a complexity analy- sis of all mechanisms and expose its main properties, linking the theoretical basis with the results.
Note:	Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2023, Director: Sergio Escalera Guerrero, Albert Clapés i Sintes i David Pujol
URI:	http://hdl.handle.net/2445/203127
Appears in Collections:	Treballs Finals de Grau (TFG) - Matemàtiques

Files in This Item:

File	Description	Size	Format
tfg_oriol_martinez_perez.pdf	Memòria	1.93 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License