Creació d’un corpus d’entailment en espanyol

Grau Francitorra, Patricia

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/171914

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Taulé Delor, Mariona	-
dc.contributor.author	Grau Francitorra, Patricia	-
dc.date.accessioned	2020-11-10T13:31:33Z	-
dc.date.available	2020-11-10T13:31:33Z	-
dc.date.issued	2020-06-12	-
dc.identifier.uri	http://hdl.handle.net/2445/171914	-
dc.description	Treballs Finals de Grau de Lingüística. Facultat de Filologia. Universitat de Barcelona, Curs: 2019-2020, Tutora: Mariona Taulé Delor	ca
dc.description.abstract	[cat] L’estudi de la inferència en el llenguatge natural i la seva detecció pot suposar un avenç important en les tecnologies del llenguatge. Per aquest motiu, s’han creat corpus per a tasca de Natural Language Inference, que estudien l’entailment i la contradicció, si bé han estat centrats en l’anglès. Aquest treball presenta la metodologia duta a terme per a la creació i anotació manual de frases d’un corpus d’entailment en espanyol. Especialment, s’ha descrit el procés de creació de frases inferides de textos a través d’uns criteris que n’asseguren la seva riquesa, que hi hagi diferents nivells de complexitat i eviten que hi hagi informació esbiaixada. S’han creat 940 hipòtesis inferides a partir de 470 frases inicials, que van ser extretes de 6 articles de la Viquipèdia. El corpus d’entailment en espanyol forma part d’un corpus més gran de Natural Language Inference que s’està duent a terme en el marc d’un projecte que desenvolupa el grup de recerca CLiC (Centre de Llenguatge i Computació) de la Universitat de Barcelona.	ca
dc.description.abstract	[eng] The study of Natural Language Inference and its detection may suppose an important advance in language technology. For this reason, many corpora regarding entailment and contradiction have been created, even though most of them have been written for English. This work presents the methodology for the creation and annotation of a corpus of entailed sentences in Spanish made by humans. Especially, the process of creation of entailed sentences from texts through some criteria that ensure its richness, different levels of complexity and lack of bias. 940 hypotheses have been entailed from 470 texts, which were taken from 6 Wikipedia articles. The corpus of entailment in Spanish is part of a larger corpus about Natural Language Inference, which is being developed in the project of the research group CLiC (Centre de Llenguatge i Computació) of the University of Barcelona.	ca
dc.format.extent	33 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	cat	ca
dc.rights	cc-by-nc-nd (c) Grau Francitorra, 2020	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.source	Treballs Finals de Grau (TFG) - Lingüística	-
dc.subject.classification	Corpus (Lingüística)	cat
dc.subject.classification	Lingüística computacional	cat
dc.subject.classification	Inferència	cat
dc.subject.classification	Treballs de fi de grau	cat
dc.subject.other	Corpora (Linguistics)	eng
dc.subject.other	Computational linguistics	eng
dc.subject.other	Inference	eng
dc.subject.other	Bachelor's theses	eng
dc.title	Creació d’un corpus d’entailment en espanyol	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
Appears in Collections:	Treballs Finals de Grau (TFG) - Lingüística

Files in This Item:

File	Description	Size	Format
Creació dun corpus dentailment en espanyol - Patricia Grau Francitorra (TFG).pdf		628.66 kB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License