El Dipòsit Digital ha actualitzat el programari. Contacteu amb dipositdigital@ub.edu per informar de qualsevol incidència.

 
Carregant...
Miniatura

Tipus de document

Tesi

Versió

Versió publicada

Data de publicació

Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/217179

Gaze Estimation with Spatiotemporal and Multimodal Deep Learning

Títol de la revista

ISSN de la revista

Títol del volum

Resum

[eng] lt is often said that the eyes are the window to the soul. The eyes and their behavior have sparked interest for centuries, and have been widely studied due to their link with multiple developmental, neurological, behavioral, cognitive, and clinical factors. Furthermore, the ability to accurately detect the fine of sight has enabled many possibilities for consumer applications, such as human-computer interaction and gaze-contingent displays. Eye-tracking technology has evolved to the point where non­invasive, sufficiently accurate, and cost-effective camera-based approaches are becoming increasingly available, driven by the progressive miniaturization of electronics and breakthroughs in computer vision and deep learning: However, achieving universal applicability in eye tracking remains a challenge, primarily due to the influence of individual factors, varying environmental conditions, and the impact of sensor viewpoint or head pose shifts. Recent remate and portable eye-tracking·devices often sacrifice robustness and accuracy when used in uncontrolled scenarios. In addition, they grapple with the need for rapid eye signa! capture, a crucial requirement for specific applications. The promising potential of eye tracking motivates us to further enhance existing methods, striving for greater reliability, accuracy, and speed. In turn, as eye tracking becomes more ubiquitous, it encourages us to explore innovative applications that leverage its expanding capabilities. This thesis approaches eye tracking from a computer vision and deep learning perspective, with the goal of: 1) increasing the accuracy and sampling rate of current gaze estimation approaches across different scenarios and devices; and 2} promoting the use of gaze input in emerging applications. For the first goal, we investigate the contribution of spatiotemporal and multimodal/multisensor cues for gaze estimation, both for remote cameras (e.g., desktop setting) and infrared, near-eye devices (e.g.,·head-mounted displays), across different sources of variability. To do so, we rely on the combination of convolutional-recurrent deep neural networks and feature-based and hybrid multimodal fusion. In. particular, we address multimodality from two different angles. First, by combining appearance and shape cues (i.e., 3D facial landmarks) extracted from RGB face images to increase accuracy. And second, by combining the signa·! obtained by two different sensors (camera and photosensors) operating at the same or different sampling rates, to increase the accuracy and the effective sampling rate of the estimated gaze signa!. We then move on to the second goal, for which we explore the use of gaze-related features along with other modalities, such as speech and facial expressions, for emotion expression recognition in a conversational human-machine interaction scenario. More concretely, we focus on the interaction between a simulated virtual coach and older adults, delving into the nuances of affective computing in this context.

Descripció

Citació

Citació

PALMERO CANTARIÑO, Cristina. Gaze Estimation with Spatiotemporal and Multimodal Deep Learning. [consulta: 30 de novembre de 2025]. [Disponible a: https://hdl.handle.net/2445/217179]

Exportar metadades

JSON - METS

Compartir registre