Large language models and causal analysis: zero-shot counterfactuals in hate speech perception

dc.contributor.advisorPros Rius, Roger
dc.contributor.advisorVitrià i Marca, Jordi
dc.contributor.authorHernández Jiménez, Sergio
dc.date.accessioned2024-09-19T09:11:48Z
dc.date.available2024-09-19T09:11:48Z
dc.date.issued2024-06-30
dc.descriptionTreballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2023-2024. Tutor: Roger Pros Rius i Jordi Vitrià i Marcaca
dc.description.abstract[en] Detecting hate speech is crucial for maintaining the integrity of social media platforms, as it involves identifying content that denigrates individuals or groups based on their characteristics. However, the expression of hate can be different across different demographics and platforms, making its detection a complex task. A significant factor in hate speech is the presence of offense, which alters the perception of hate without altering the core meaning of the text. This study aims to examine how offense affects the perception of hate speech in social media comments. To achieve this, we employ two distinct causal inference methods to measure the impact of offensive language on the detection of hate speech. The first method utilizes the traditional backdoor criterion, which allows us to model the nodes of the causal graph as features in a machine learning model that predicts hate. This method is demanding from a modeling point of view, as it requires training a specific model for each node in the causal graph. The second method leverages the capabilities of Large Language Models (LLMs) to generate textual counterfactuals in a zero-shot manner, i.e., without requiring any training or fine-tuning. These textual counterfactuals are then used to estimate causal effects. Our findings reveal that the causal effect of offense on hate is higher with the LLM generated counterfactuals than with the methodology that follows the backdoor criterion. Additionally, we train a machine learning model to directly predict the causal effect from a comment.ca
dc.format.extent37 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/215277
dc.language.isoengca
dc.rightscc-by-nc-nd (c) Sergio Hernández Jiménez, 2024
dc.rightscodi: GPL (c) Sergio Hernández Jiménez, 2024
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.rights.urihttp://www.gnu.org/licenses/gpl-3.0.ca.html*
dc.sourceMàster Oficial - Fonaments de la Ciència de Dades
dc.subject.classificationDiscurs de l'odi
dc.subject.classificationXarxes socials en línia
dc.subject.classificationEstadística matemàtica
dc.subject.classificationTreballs de fi de màster
dc.subject.classificationTractament del llenguatge natural (Informàtica)ca
dc.subject.otherHate speech
dc.subject.otherOnline social networks
dc.subject.otherMathematical statistics
dc.subject.otherMaster's thesis
dc.subject.otherNatural language processing (Computer science)en
dc.titleLarge language models and causal analysis: zero-shot counterfactuals in hate speech perceptionca
dc.typeinfo:eu-repo/semantics/masterThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
tfm_hernandez_jimenez_sergio.pdf
Mida:
844.47 KB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
Causal_NLP_TFM-main.zip
Mida:
4.84 MB
Format:
ZIP file
Descripció:
Codi font