Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

Inurrieta, Uxoa; Aduriz, Itziar; Diaz de Ilarraza, Arantza; Labaka, Gorka; Sarasola, Kepa

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/174917

Full metadata record

DC Field	Value	Language
dc.contributor.author	Inurrieta, Uxoa	-
dc.contributor.author	Aduriz, Itziar	-
dc.contributor.author	Diaz de Ilarraza, Arantza	-
dc.contributor.author	Labaka, Gorka	-
dc.contributor.author	Sarasola, Kepa	-
dc.date.accessioned	2021-03-11T11:21:57Z	-
dc.date.available	2021-03-11T11:21:57Z	-
dc.date.issued	2020-08-27	-
dc.identifier.issn	1932-6203	-
dc.identifier.uri	http://hdl.handle.net/2445/174917	-
dc.description.abstract	Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work.	-
dc.format.extent	18 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Public Library of Science (PLoS)	-
dc.relation.isformatof	Reproducció del document publicat a: https://doi.org/10.1371/journal.pone.0237767	-
dc.relation.ispartof	PLoS One, 2020, vol. 15, num. 8, p. e0237767	-
dc.relation.uri	https://doi.org/10.1371/journal.pone.0237767	-
dc.rights	cc-by (c) Inurrieta, Uxoa et al., 2020	-
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es	-
dc.source	Articles publicats en revistes (Filologia Catalana i Lingüística General)	-
dc.subject.classification	Morfologia (Gramàtica)	-
dc.subject.classification	Semàntica	-
dc.subject.classification	Aprenentatge automàtic	-
dc.subject.other	Morphology (Grammar)	-
dc.subject.other	Semantics	-
dc.subject.other	Machine learning	-
dc.title	Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification	-
dc.type	info:eu-repo/semantics/article	-
dc.type	info:eu-repo/semantics/publishedVersion	-
dc.identifier.idgrec	703445	-
dc.date.updated	2021-03-11T11:21:57Z	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
dc.identifier.pmid	32853283	-
Appears in Collections:	Articles publicats en revistes (Filologia Catalana i Lingüística General)

Files in This Item:

File	Description	Size	Format
703445.pdf		1.07 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License