Integrating lexical and prosodic features for automatic paragraph segmentation

dc.contributor.authorLai, Catherine
dc.contributor.authorFarrús, Mireia
dc.contributor.authorMoore, Johanna D.
dc.date.accessioned2022-02-07T17:01:06Z
dc.date.available2022-08-31T05:10:25Z
dc.date.issued2020-08
dc.date.updated2022-02-07T17:01:06Z
dc.description.abstractSpoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that integrate representations generated by separate lexical and prosodic models while allowing interactions between these features streams rather than treating them as independent information sources. Application to ASR outputs shows that adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to transcription errors.
dc.format.extent14 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec705850
dc.identifier.issn0167-6393
dc.identifier.urihttps://hdl.handle.net/2445/182997
dc.language.isoeng
dc.publisherElsevier B.V.
dc.relation.isformatofVersió postprint del document publicat a: https://doi.org/10.1016/j.specom.2020.04.007
dc.relation.ispartofSpeech Communication, 2020, vol. 121, p. 44-57
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/645012/EU//KRISTINA
dc.relation.urihttps://doi.org/10.1016/j.specom.2020.04.007
dc.rightscc-by-nc-nd (c) Elsevier B.V., 2020
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.sourceArticles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject.classificationAnàlisi prosòdica (Lingüística)
dc.subject.classificationMarcadors del discurs
dc.subject.classificationDicció
dc.subject.otherProsodic analysis (Linguistics)
dc.subject.otherDiscourse markers
dc.subject.otherDiction
dc.titleIntegrating lexical and prosodic features for automatic paragraph segmentation
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/acceptedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
705850.pdf
Mida:
817.88 KB
Format:
Adobe Portable Document Format