Article similarities using transformers

Beaus Iranzo, Rafael

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/188695

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Gómez Duran, Paula	-
dc.contributor.advisor	Vitrià i Marca, Jordi	-
dc.contributor.author	Beaus Iranzo, Rafael	-
dc.date.accessioned	2022-09-06T10:19:09Z	-
dc.date.available	2022-09-06T10:19:09Z	-
dc.date.issued	2022-06-13	-
dc.identifier.uri	https://hdl.handle.net/2445/188695	-
dc.description	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2022, Director: Paula Gómez Duran i Jordi Vitrià i Marca	ca
dc.description.abstract	[en] The field of natural language processing is essential in today’s data-driven world. In 2017 the Tranformers architecture was introduced based on the concept of attention from 2014. The effects of this new structure were already changing the paradigm when the language processing model BERT marked an inflection point, in 2018. BERT makes use of the Transformers’ parallelization to achieve a network that can be pretrained. In that pretraining, the model is able to learn how a language works on its own: by only feeding it with texts. An improved version came out shortly after, RoBERTa, after which most of the models were based. In this thesis, we will focus on studying BERTa (a RoBERTa-based Catalan language model) with a dataset from the Gran Enciclopèdia Catalana. That analysis will include tasks to assess how does the model perform with real-world data. The study aims to validate the quality of the resulting embeddings produced by the model in order to further use them to build an article retrieval platform. There, each article query could be related to those with similar information. The semantic textual similarity describes how alike a pair of sentences are and this will be a fundamental target for the designed experiments and development. Finally, the results will be visualized and interpreted by using a simple front- end tool also created in this work.	ca
dc.format.extent	50 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	ca
dc.rights	memòria: cc-nc-nd (c) Rafael Beaus Iranzo, 2022	-
dc.rights	codi: GPL (c) Rafael Beaus Iranzo, 2022	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	-
dc.rights.uri	http://www.gnu.org/licenses/gpl-3.0.ca.html	*
dc.source	Treballs Finals de Grau (TFG) - Enginyeria Informàtica	-
dc.subject.classification	Tractament del llenguatge natural (Informàtica)	ca
dc.subject.classification	Aprenentatge automàtic	ca
dc.subject.classification	Programari	ca
dc.subject.classification	Treballs de fi de grau	ca
dc.subject.classification	Xarxes neuronals (Informàtica)	ca
dc.subject.other	Natural language processing (Computer science)	en
dc.subject.other	Machine learning	en
dc.subject.other	Computer software	en
dc.subject.other	Neural networks (Computer science)	en
dc.subject.other	Bachelor's theses	en
dc.title	Article similarities using transformers	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
Appears in Collections:	Programari - Treballs de l'alumnat Treballs Finals de Grau (TFG) - Enginyeria Informàtica Treballs Finals de Grau (TFG) - Matemàtiques

Files in This Item:

File	Description	Size	Format
tfg_beaus_iranzo_rafael.pdf	Memòria	1.8 MB	Adobe PDF	View/Open
tfg-transformers-main.zip	Codi font	6.47 MB	zip	View/Open

Show simple item record

This item is licensed under a Creative Commons License