Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

dc.contributor.authorInurrieta, Uxoa
dc.contributor.authorAduriz, Itziar
dc.contributor.authorDiaz de Ilarraza, Arantza
dc.contributor.authorLabaka, Gorka
dc.contributor.authorSarasola, Kepa
dc.date.accessioned2021-03-11T11:21:57Z
dc.date.available2021-03-11T11:21:57Z
dc.date.issued2020-08-27
dc.date.updated2021-03-11T11:21:57Z
dc.description.abstractMultiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work.
dc.format.extent18 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec703445
dc.identifier.issn1932-6203
dc.identifier.pmid32853283
dc.identifier.urihttps://hdl.handle.net/2445/174917
dc.language.isoeng
dc.publisherPublic Library of Science (PLoS)
dc.relation.isformatofReproducció del document publicat a: https://doi.org/10.1371/journal.pone.0237767
dc.relation.ispartofPLoS One, 2020, vol. 15, num. 8, p. e0237767
dc.relation.urihttps://doi.org/10.1371/journal.pone.0237767
dc.rightscc-by (c) Inurrieta, Uxoa et al., 2020
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es
dc.sourceArticles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject.classificationMorfologia (Gramàtica)
dc.subject.classificationSemàntica
dc.subject.classificationAprenentatge automàtic
dc.subject.otherMorphology (Grammar)
dc.subject.otherSemantics
dc.subject.otherMachine learning
dc.titleLearning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
703445.pdf
Mida:
1.04 MB
Format:
Adobe Portable Document Format