Corpora compilation for prosody-informed speech processing

Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community.

Matèries

Reconeixement automàtic de la parla, Traducció automàtica, Puntuació, Corpus (Lingüística)

Matèries (anglès)

Automatic speech recognition, Machine translating, Punctuation, Corpora (Linguistics)

Col·leccions

Articles publicats en revistes (Filologia Catalana i Lingüística General)

Pàgina completa de l'ítem

Citació

ÖKTEM, Alp, FARRÚS, Mireia and BONAFONTE, Antonio. Corpora compilation for prosody-informed speech processing. Language Resources And Evaluation. 2021. Vol. 55, num. 4, pags. 925-946. ISSN 1574-020X. [consulted: 18 of June of 2026]. Available at: https://hdl.handle.net/2445/182546

Estadístiques

Exportar metadades

JSON - METS

Fitxers

Tipus de document

Versió

Data de publicació

Tots els drets reservats

Corpora compilation for prosody-informed speech processing

Títol de la revista

Autors

Director/Tutor

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

Matèries

Matèries (anglès)

Citació

Col·leccions

Citació

Exportar metadades

Fitxers

Tipus de document

Versió

Data de publicació

Tots els drets reservats

Corpora compilation for prosody-informed speech processing

Títol de la revista

Autors

Director/Tutor

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

Matèries

Matèries (anglès)

Citació

Col·leccions

Citació

Exportar metadades

Compartir registre