Non-acted multi-view audio-visual dyadic Interactions. Project master thesis: multi-modal local and recurrent non-verbal emotion recognition in dyadic scenarios

dc.contributor.advisorEscalera Guerrero, Sergio
dc.contributor.advisorPalmero Cantariño, Cristina
dc.contributor.authorBarco Terrones, Rubén
dc.date.accessioned2020-05-08T07:33:41Z
dc.date.available2020-05-08T07:33:41Z
dc.date.issued2019-09-02
dc.descriptionTreballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmeroca
dc.description.abstract[en] In particular, this master thesis is focused on the development of baseline emotion recognition system in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level with and without temporal information. For this reason, an exhaustive study of the state-of-the-art on emotion recognition techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the architectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (i.e. face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (i.e. network depth, fusion, etc.) has been demonstrated and studied, while some experiments to study the influence of the temporal information are performed using some recurrent models for the spatiotemporal utterance level recognition of emotion. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work.ca
dc.format.extent65 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/159257
dc.language.isoengca
dc.rightscc-by-sa (c) Rubén Barco Terrones, 2019
dc.rightscodi: GPL (c) Rubén Barco Terrones, 2019
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/es/*
dc.rights.urihttp://www.gnu.org/licenses/gpl-3.0.ca.html
dc.sourceMàster Oficial - Fonaments de la Ciència de Dades
dc.subject.classificationAprenentatge automàtic
dc.subject.classificationEmocions
dc.subject.classificationTreballs de fi de màster
dc.subject.classificationExpressió facial
dc.subject.otherMachine learning
dc.subject.otherEmotions
dc.subject.otherMaster's theses
dc.subject.otherFacial expression
dc.titleNon-acted multi-view audio-visual dyadic Interactions. Project master thesis: multi-modal local and recurrent non-verbal emotion recognition in dyadic scenariosca
dc.typeinfo:eu-repo/semantics/masterThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
159257.pdf
Mida:
21.48 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
codi_font.zip
Mida:
1.94 MB
Format:
ZIP file