A proposal for Wide-Coverage Spanish Named Entity Recognition

dc.contributor.authorArévalo, Montse
dc.contributor.authorCarreras, Xavier
dc.contributor.authorMartí Antonin, M. Antònia
dc.contributor.authorMàrquez, Lluís
dc.contributor.authorPadró, Lluís
dc.contributor.authorSimón, María José
dc.date.accessioned2019-03-12T14:02:32Z
dc.date.available2019-03-12T14:02:32Z
dc.date.issued2002
dc.date.updated2019-03-12T14:02:32Z
dc.description.abstractThis paper presents a proposal for wide--coverage Named Entity Recognition for Spanish. First, a linguistic description of the typology of Named Entities is proposed. Following this definition an architecture of sequential processes is described for addressing the recognition and classification of strong and weak Named Entities. The former are treated using Machine Learning techniques (AdaBoost) and simple attributes requiring non tagged corpora complemented with external information sources (a list of trigger words and a gazetteer). The latter are approached through a context free grammar for recognizing syntactic patterns. A deep evaluation of the first task on real corpora to validate the appropriateness of the approach is presented. A preliminar version of the context free grammar is qualitatively evaluated with also good results on a small hand--tagged corpus.
dc.format.extent18 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec508180
dc.identifier.issn1135-5948
dc.identifier.urihttps://hdl.handle.net/2445/130105
dc.language.isoeng
dc.publisherSociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
dc.relation.isformatofReproducció del document publicat a: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3305
dc.relation.ispartofProcesamiento del lenguaje natural , 2002, num. 28, p. 63-80
dc.rights(c) Arévalo, Montse et al., 2002
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.sourceArticles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject.classificationTractament del llenguatge natural (Informàtica)
dc.subject.otherNatural language processing (Computer science)
dc.titleA proposal for Wide-Coverage Spanish Named Entity Recognition
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
508180.pdf
Mida:
402.17 KB
Format:
Adobe Portable Document Format