Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/159257
Title: Non-acted multi-view audio-visual dyadic Interactions. Project master thesis: multi-modal local and recurrent non-verbal emotion recognition in dyadic scenarios
Author: Barco Terrones, Rubén
Director/Tutor: Escalera Guerrero, Sergio
Palmero, Cristina
Keywords: Aprenentatge automàtic
Emocions
Treballs de fi de màster
Expressió facial
Machine learning
Emotions
Master's theses
Facial expression
Issue Date: 2-Sep-2019
Abstract: [en] In particular, this master thesis is focused on the development of baseline emotion recognition system in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level with and without temporal information. For this reason, an exhaustive study of the state-of-the-art on emotion recognition techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the architectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (i.e. face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (i.e. network depth, fusion, etc.) has been demonstrated and studied, while some experiments to study the influence of the temporal information are performed using some recurrent models for the spatiotemporal utterance level recognition of emotion. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work.
Note: Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero
URI: http://hdl.handle.net/2445/159257
Appears in Collections:Programari - Treballs de l'alumnat
Màster Oficial - Fonaments de la Ciència de Dades

Files in This Item:
File Description SizeFormat 
159257.pdfMemòria21.99 MBAdobe PDFView/Open
codi_font.zip1.99 MBzipView/Open


This item is licensed under a Creative Commons License Creative Commons