El CRAI romandrà tancat del 24 de desembre de 2025 al 6 de gener de 2026. La validació de documents es reprendrà a partir del 7 de gener de 2026.
El CRAI permanecerá cerrado del 24 de diciembre de 2025 al 6 de enero de 2026. La validación de documentos se reanudará a partir del 7 de enero de 2026.
From 2025-12-24 to 2026-01-06, the CRAI remain closed and the documents will be validated from 2026-01-07.
 
Carregant...
Miniatura

Tipus de document

Article

Versió

Versió publicada

Data de publicació

Llicència de publicació

cc-by (c) Cámbara, Guillermo et al., 2022
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/183953

TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants

Títol de la revista

Director/Tutor

ISSN de la revista

Títol del volum

Resum

Wake-up word spotting in noisy environments is a critical task for an excellent user experience with voice assistants. Unwanted activation of the device is often due to the presence of noises coming from background conversations, TVs, or other domestic appliances. In this work, we propose the use of a speech enhancement convolutional autoencoder, coupled with on-device keyword spotting, aimed at improving the trigger word detection in noisy environments. The end-to-end system learns by optimizing a linear combination of losses: a reconstruction-based loss, both at the log-mel spectrogram and at the waveform level, as well as a specific task loss that accounts for the cross-entropy error reported along the keyword spotting detection. We experiment with several neural network classifiers and report that deeply coupling the speech enhancement together with a wake-up word detector, e.g., by jointly training them, significantly improves the performance in the noisiest conditions. Additionally, we introduce a new publicly available speech database recorded for the Telefónica's voice assistant, Aura. The OK Aura Wake-up Word Dataset incorporates rich metadata, such as speaker demographics or room conditions, and comprises hard negative examples that were studiously selected to present different levels of phonetic similarity with respect to the trigger words 'OK Aura'. Keywords: speech enhancement; wake-up word; keyword spotting; deep learning; convolutional neural network

Citació

Citació

CÁMBARA, Guillermo, LÓPEZ, Fernando, BONET, David, GÓMEZ, Pablo, SEGURA, Carlos, FARRÚS, Mireia, LUQUE, Jordi. TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants. _Applied Sciences_. 2022. Vol. 12, núm. 4, pàgs. 1974. [consulta: 7 de gener de 2026]. ISSN: 2076-3417. [Disponible a: https://hdl.handle.net/2445/183953]

Exportar metadades

JSON - METS

Compartir registre