Exploring transformers for localizing moments of actions

Diéguez Vilà, Joel

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/198502

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Radeva, Petia	-
dc.contributor.advisor	Talavera Martínez, Estefanía	-
dc.contributor.author	Diéguez Vilà, Joel	-
dc.date.accessioned	2023-05-26T09:44:44Z	-
dc.date.available	2023-05-26T09:44:44Z	-
dc.date.issued	2023-01-24	-
dc.identifier.uri	http://hdl.handle.net/2445/198502	-
dc.description	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2023, Director: Petia Radeva i Estefanía Talavera	ca
dc.description.abstract	[en] The field of machine learning is being applied to all aspects of our daily lives. From chatbots that speak like a human to artificial intelligence able to generate art, there are many neural networks capable of doing a better job than humans. In 2022, the Ego4D dataset was published, a set of large-scale first-person annotated videos as never seen before, which opened the door to new branches of research on video analysis. The publication of the dataset was accompanied by several challenges, in particular, one of them is the one we will face in this memory: the Moment Queries, which consists of the temporal localization of concrete actions in a video. This is a highly complex problem that needs very powerful image analysis techniques! Transformers are a type of neural network based on the concept of attention that revolutionized the field of Deep Learning. Since their appearance in 2017, they have proven their usefulness in various artificial intelligence applications such as natural text processing or computer vision, being able to outperform all previous results of Neural networks. In this work, we have made an in-depth study of Transformers and we have tested their performance in the field of image classification. With the knowledge obtained, we have analyzed how they can be applied to the Moment Queries problem. We have developed our proposal - a model that uses a pyramid of attention mechanisms to refine the data and provide the prediction modules with the best possible information. With this implementation, we have managed to locate and catalog 28% of the actions with a tIoU value better than 0.5.	ca
dc.format.extent	54 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	ca
dc.rights	memòria: cc-nc-nd (c) Joel Diéguez Vilà, 2023	-
dc.rights	codi: GPL (c) Joel Diéguez Vilà, 2023	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	-
dc.rights.uri	http://www.gnu.org/licenses/gpl-3.0.ca.html	*
dc.source	Treballs Finals de Grau (TFG) - Enginyeria Informàtica	-
dc.subject.classification	Aprenentatge automàtic	ca
dc.subject.classification	Xarxes neuronals (Informàtica)	ca
dc.subject.classification	Programari	ca
dc.subject.classification	Treballs de fi de grau	ca
dc.subject.classification	Visió per ordinador	ca
dc.subject.classification	Processament digital d'imatges	ca
dc.subject.other	Machine learning	en
dc.subject.other	Neural networks (Computer science)	en
dc.subject.other	Computer software	en
dc.subject.other	Computer vision	en
dc.subject.other	Digital image processing	en
dc.subject.other	Bachelor's theses	en
dc.title	Exploring transformers for localizing moments of actions	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
Appears in Collections:	Programari - Treballs de l'alumnat Treballs Finals de Grau (TFG) - Matemàtiques Treballs Finals de Grau (TFG) - Enginyeria Informàtica

Files in This Item:

File	Description	Size	Format
tfg_dieguez_vila_joel.pdf	Memòria	7.17 MB	Adobe PDF	View/Open
code.zip	Codi font	943.56 kB	zip	View/Open

Show simple item record

This item is licensed under a Creative Commons License