Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/198502
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorRadeva, Petia-
dc.contributor.advisorTalavera Martínez, Estefanía-
dc.contributor.authorDiéguez Vilà, Joel-
dc.date.accessioned2023-05-26T09:44:44Z-
dc.date.available2023-05-26T09:44:44Z-
dc.date.issued2023-01-24-
dc.identifier.urihttp://hdl.handle.net/2445/198502-
dc.descriptionTreballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2023, Director: Petia Radeva i Estefanía Talaveraca
dc.description.abstract[en] The field of machine learning is being applied to all aspects of our daily lives. From chatbots that speak like a human to artificial intelligence able to generate art, there are many neural networks capable of doing a better job than humans. In 2022, the Ego4D dataset was published, a set of large-scale first-person annotated videos as never seen before, which opened the door to new branches of research on video analysis. The publication of the dataset was accompanied by several challenges, in particular, one of them is the one we will face in this memory: the Moment Queries, which consists of the temporal localization of concrete actions in a video. This is a highly complex problem that needs very powerful image analysis techniques! Transformers are a type of neural network based on the concept of attention that revolutionized the field of Deep Learning. Since their appearance in 2017, they have proven their usefulness in various artificial intelligence applications such as natural text processing or computer vision, being able to outperform all previous results of Neural networks. In this work, we have made an in-depth study of Transformers and we have tested their performance in the field of image classification. With the knowledge obtained, we have analyzed how they can be applied to the Moment Queries problem. We have developed our proposal - a model that uses a pyramid of attention mechanisms to refine the data and provide the prediction modules with the best possible information. With this implementation, we have managed to locate and catalog 28% of the actions with a tIoU value better than 0.5.ca
dc.format.extent54 p.-
dc.format.mimetypeapplication/pdf-
dc.language.isoengca
dc.rightsmemòria: cc-nc-nd (c) Joel Diéguez Vilà, 2023-
dc.rightscodi: GPL (c) Joel Diéguez Vilà, 2023-
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/-
dc.rights.urihttp://www.gnu.org/licenses/gpl-3.0.ca.html*
dc.sourceTreballs Finals de Grau (TFG) - Enginyeria Informàtica-
dc.subject.classificationAprenentatge automàticca
dc.subject.classificationXarxes neuronals (Informàtica)ca
dc.subject.classificationProgramarica
dc.subject.classificationTreballs de fi de grauca
dc.subject.classificationVisió per ordinadorca
dc.subject.classificationProcessament digital d'imatgesca
dc.subject.otherMachine learningen
dc.subject.otherNeural networks (Computer science)en
dc.subject.otherComputer softwareen
dc.subject.otherComputer visionen
dc.subject.otherDigital image processingen
dc.subject.otherBachelor's thesesen
dc.titleExploring transformers for localizing moments of actionsca
dc.typeinfo:eu-repo/semantics/bachelorThesisca
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
Appears in Collections:Programari - Treballs de l'alumnat
Treballs Finals de Grau (TFG) - Matemàtiques
Treballs Finals de Grau (TFG) - Enginyeria Informàtica

Files in This Item:
File Description SizeFormat 
tfg_dieguez_vila_joel.pdfMemòria7.17 MBAdobe PDFView/Open
code.zipCodi font943.56 kBzipView/Open


This item is licensed under a Creative Commons License Creative Commons