El Dipòsit Digital ha actualitzat el programari. Qualsevol incidència que trobeu si us plau contacteu amb dipositdigital@ub.edu.

 

NewsCom-TOX: A corpus of comments on news articles annotated for toxicity in Spanish

dc.contributor.authorTaulé Delor, Mariona
dc.contributor.authorNofre, Montserrat
dc.contributor.authorBargiela, Víctor
dc.contributor.authorBonet, Xavier
dc.date.accessioned2025-04-02T16:11:33Z
dc.date.available2025-04-02T16:11:33Z
dc.date.issued2024-01-17
dc.date.updated2025-04-02T16:11:33Z
dc.description.abstractIn this article, we present the NewsCom-TOX corpus, a new corpus manually annotated for toxicity in Spanish. NewsCom-TOX consists of 4359 comments in Spanish posted in response to 21 news articles on social media related to immigration, in order to analyse and identify messages with racial and xenophobic content. This corpus is multi-level annotated with different binary linguistic categories -stance, target, stereotype, sarcasm, mockery, insult, improper language, aggressiveness and intolerance- taking into account not only the information conveyed in each comment, but also the whole discourse thread in which the comment occurs, as well as the information conveyed in the news article, including their images. These categories allow us to identify the presence of toxicity and its intensity, that is, the level of toxicity of each comment. All this information is available for research purposes upon request. Here we describe the NewsCom-TOX corpus, the annotation tagset used, the criteria applied and the annotation process carried out, including the inter-annotator agreement tests conducted. A quantitative analysis of the results obtained is also provided. NewsCom-TOX is a linguistic resource that will be valuable for both linguistic and computational research in Spanish in NLP tasks for the detection of toxic information.
dc.format.extent41 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec741843
dc.identifier.issn1574-020X
dc.identifier.urihttps://hdl.handle.net/2445/220216
dc.language.isoeng
dc.publisherSpringer Verlag
dc.relation.isformatofVersió postprint del document publicat a: https://doi.org/10.1007/s10579-023-09711-x
dc.relation.ispartofLanguage Resources And Evaluation, 2023, num.58, p. 1115-1155
dc.relation.urihttps://doi.org/10.1007/s10579-023-09711-x
dc.rights(c) Springer Verlag, 2023
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.sourceArticles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject.classificationTelenotícies
dc.subject.classificationFake news
dc.subject.classificationCastellà (Llengua)
dc.subject.classificationCorpus (Lingüística)
dc.subject.otherTelevision broadcasting of news
dc.subject.otherFake news
dc.subject.otherSpanish language
dc.subject.otherCorpora (Linguistics)
dc.titleNewsCom-TOX: A corpus of comments on news articles annotated for toxicity in Spanish
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/acceptedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
840381.pdf
Mida:
993.46 KB
Format:
Adobe Portable Document Format