Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/219516
Title: Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes
Author: Schmeisser-Nieto, Wolfgang S.
Cignarella, Alessandra Teresa
Bourgeade, Tom
Frenda, Simona
Ariza-Casabona, Alejandro
Laurent, Mario
Cicirelli, Paolo Giovanni
Marra, Andrea
Corbelli, Giuseppe
Benamara, Farah
Bosco, Cristina
Moriceau, Véronique
Paciello, Marinella
Taulé Delor, Mariona
D'Errico, Francesca
Keywords: Migrants
Psicologia social
Migrants
Social psychology
Issue Date: 19-Dec-2024
Publisher: Springer Verlag
Abstract: Stereotypes have been studied extensively in the felds of social psychology and, especially with the recent advances in technology, in computational linguistics. Stereotypes have also gained even more attention nowadays because of a notable rise in their dissemination due to demographic changes and world events. This paper focuses on ethnic stereotypes related to immigration and presents the StereoHoax corpus, a multilingual dataset of 17,814 tweets in French, Italian, and Spanish. The corpus includes conversational threads reporting on and responding to racial hoaxes about immigrants, which we defne as false claims of unlawful actions attributed to specifc ethnic groups. This work describes the data collection process and the fne-grained annotation scheme we used, which is based mainly on the Stereotype Content Model adapted to the study applied to immigrants of Bosco et al. (2023). Quantitative and qualitative analyses show the distribution and correlation of annotated categories across languages, revealing, for instance, intercultural diferences in the expression of stereotypes through forms of discredit. To validate our data, we performed four machine learning experiments using pre-trained BERT-like models in order to lay a foundation for automatic stereotype detection research. Leveraging the StereoHoax corpus, we gained crucial insights into the importance of context, especially in relation to the detection of implicit stereotypes. Overall, we believe that the StereoHoax corpus will prove to be a valuable resource for the automatic detection of stereotypes regarding immigrants and the study of the linguistic and psychological patterns associated with their dissemination.
Note: Versió postprint del document publicat a: https://doi.org/https://doi.org/10.1007/s10579-024-09791-3
It is part of: Language Resources And Evaluation, 2024
URI: https://hdl.handle.net/2445/219516
Related resource: https://doi.org/https://doi.org/10.1007/s10579-024-09791-3
ISSN: 1574-020X
Appears in Collections:Articles publicats en revistes (Filologia Catalana i Lingüística General)

Files in This Item:
File Description SizeFormat 
887989.pdf2.14 MBAdobe PDFView/Open    Request a copy


Embargat   Document embargat fins el 18-12-2025


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.