Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/130105
Title: A proposal for Wide-Coverage Spanish Named Entity Recognition
Author: Arévalo, Montse
Carreras, Xavier
Martí Antonin, M. Antònia
Márquez, Lluís
Padró, Lluís
Simón, María José
Keywords: Tractament del llenguatge natural (Informàtica)
Natural language processing (Computer science)
Issue Date: 2002
Publisher: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Abstract: This paper presents a proposal for wide--coverage Named Entity Recognition for Spanish. First, a linguistic description of the typology of Named Entities is proposed. Following this definition an architecture of sequential processes is described for addressing the recognition and classification of strong and weak Named Entities. The former are treated using Machine Learning techniques (AdaBoost) and simple attributes requiring non tagged corpora complemented with external information sources (a list of trigger words and a gazetteer). The latter are approached through a context free grammar for recognizing syntactic patterns. A deep evaluation of the first task on real corpora to validate the appropriateness of the approach is presented. A preliminar version of the context free grammar is qualitatively evaluated with also good results on a small hand--tagged corpus.
Note: Reproducció del document publicat a: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3305
It is part of: Procesamiento del lenguaje natural , 2002, num. 28, p. 63-80
URI: http://hdl.handle.net/2445/130105
ISSN: 1135-5948
Appears in Collections:Articles publicats en revistes (Filologia Catalana i Lingüística General)

Files in This Item:
File Description SizeFormat 
508180.pdf402.17 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.