Carregant...
Miniatura

Tipus de document

Article

Versió

Versió publicada

Data de publicació

Llicència de publicació

cc-by (c) Inurrieta, Uxoa et al., 2020
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/174917

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

Títol de la revista

Director/Tutor

ISSN de la revista

Títol del volum

Resum

Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work.

Citació

Citació

INURRIETA, Uxoa, ADURIZ, Itziar, DIAZ DE ILARRAZA, Arantza, LABAKA, Gorka, SARASOLA, Kepa. Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. _PLoS One_. 2020. Vol. 15, núm. 8, pàgs. e0237767. [consulta: 23 de gener de 2026]. ISSN: 1932-6203. [Disponible a: https://hdl.handle.net/2445/174917]

Exportar metadades

JSON - METS

Compartir registre