Corpora compilation for prosody-informed speech processing

Öktem, Alp; Farrús, Mireia; Bonafonte, Antonio

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/182546

Full metadata record

DC Field	Value	Language
dc.contributor.author	Öktem, Alp	-
dc.contributor.author	Farrús, Mireia	-
dc.contributor.author	Bonafonte, Antonio	-
dc.date.accessioned	2022-01-21T16:04:48Z	-
dc.date.available	2022-01-21T16:04:48Z	-
dc.date.issued	2021-12	-
dc.identifier.issn	1574-020X	-
dc.identifier.uri	https://hdl.handle.net/2445/182546	-
dc.description.abstract	Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community.	-
dc.format.extent	22 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Springer Verlag	-
dc.relation.isformatof	Versió postprint del document publicat a: https://doi.org/10.1007/s10579-021-09556-2	-
dc.relation.ispartof	Language Resources And Evaluation, 2021, vol. 55, num. 4, p. 925-946	-
dc.relation.uri	https://doi.org/10.1007/s10579-021-09556-2	-
dc.rights	(c) Springer Verlag, 2021	-
dc.source	Articles publicats en revistes (Filologia Catalana i Lingüística General)	-
dc.subject.classification	Reconeixement automàtic de la parla	-
dc.subject.classification	Traducció automàtica	-
dc.subject.classification	Puntuació	-
dc.subject.classification	Corpus (Lingüística)	-
dc.subject.other	Automatic speech recognition	-
dc.subject.other	Machine translating	-
dc.subject.other	Punctuation	-
dc.subject.other	Corpora (Linguistics)	-
dc.title	Corpora compilation for prosody-informed speech processing	-
dc.type	info:eu-repo/semantics/article	-
dc.type	info:eu-repo/semantics/acceptedVersion	-
dc.identifier.idgrec	713986	-
dc.date.updated	2022-01-21T16:04:48Z	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
Appears in Collections:	Articles publicats en revistes (Filologia Catalana i Lingüística General)

Files in This Item:

File	Description	Size	Format
Öktem2021_Article_CorporaCompilationForProsody-i.pdf		2.56 MB	Adobe PDF	View/Open

Show simple item record