Article similarities using transformers

dc.contributor.advisorGómez Duran, Paula
dc.contributor.advisorVitrià i Marca, Jordi
dc.contributor.authorBeaus Iranzo, Rafael
dc.date.accessioned2022-09-06T10:19:09Z
dc.date.available2022-09-06T10:19:09Z
dc.date.issued2022-06-13
dc.descriptionTreballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2022, Director: Paula Gómez Duran i Jordi Vitrià i Marcaca
dc.description.abstract[en] The field of natural language processing is essential in today’s data-driven world. In 2017 the Tranformers architecture was introduced based on the concept of attention from 2014. The effects of this new structure were already changing the paradigm when the language processing model BERT marked an inflection point, in 2018. BERT makes use of the Transformers’ parallelization to achieve a network that can be pretrained. In that pretraining, the model is able to learn how a language works on its own: by only feeding it with texts. An improved version came out shortly after, RoBERTa, after which most of the models were based. In this thesis, we will focus on studying BERTa (a RoBERTa-based Catalan language model) with a dataset from the Gran Enciclopèdia Catalana. That analysis will include tasks to assess how does the model perform with real-world data. The study aims to validate the quality of the resulting embeddings produced by the model in order to further use them to build an article retrieval platform. There, each article query could be related to those with similar information. The semantic textual similarity describes how alike a pair of sentences are and this will be a fundamental target for the designed experiments and development. Finally, the results will be visualized and interpreted by using a simple front- end tool also created in this work.ca
dc.format.extent50 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/188695
dc.language.isoengca
dc.rightsmemòria: cc-nc-nd (c) Rafael Beaus Iranzo, 2022
dc.rightscodi: GPL (c) Rafael Beaus Iranzo, 2022
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.urihttp://www.gnu.org/licenses/gpl-3.0.ca.html*
dc.sourceTreballs Finals de Grau (TFG) - Enginyeria Informàtica
dc.subject.classificationTractament del llenguatge natural (Informàtica)ca
dc.subject.classificationAprenentatge automàticca
dc.subject.classificationProgramarica
dc.subject.classificationTreballs de fi de grauca
dc.subject.classificationXarxes neuronals (Informàtica)ca
dc.subject.otherNatural language processing (Computer science)en
dc.subject.otherMachine learningen
dc.subject.otherComputer softwareen
dc.subject.otherNeural networks (Computer science)en
dc.subject.otherBachelor's thesesen
dc.titleArticle similarities using transformersca
dc.typeinfo:eu-repo/semantics/bachelorThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
tfg_beaus_iranzo_rafael.pdf
Mida:
1.75 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
tfg-transformers-main.zip
Mida:
6.32 MB
Format:
ZIP file
Descripció:
Codi font