Machine translation evaluation metrics benchmarking: from traditional MT to LLMs

López Caro, Álvaro

Machine translation evaluation metrics benchmarking: from traditional MT to LLMs

dc.contributor.advisor	Vitrià i Marca, Jordi
dc.contributor.author	López Caro, Álvaro
dc.date.accessioned	2024-07-04T07:57:56Z
dc.date.available	2024-07-04T07:57:56Z
dc.date.issued	2023-06-30
dc.description	Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2022-2023. Tutor: Jordi Vitrià i Marca	ca
dc.description.abstract	This thesis endeavors to cast a spotlight on the evolution and applicability of machine translation (MT) evaluation metrics and models, mainly contrasting statistical methods against the more contemporary neural-based ones, where we also give special attention to the exciting modern Large Language Models (LLMs). MT, a significant area in Natural Language Processing (NLP), has seen a vast metamorphosis over the years, bringing into focus the critical need for thorough exploration of these evolving systems. Our research is anchored on the Digital Corpus of the European Parliament (DCEP), a complex and multilingual corpus that makes it an ideal testbed to benchmark MT models given its comprehensive and diversified linguistic data. Through the use of this extensive corpus, we aim to present a comprehensive benchmarking of various selected MT models, encapsulating not just their evolution but also their performance dynamics across different tasks and contexts. A vital facet of our study includes evaluating the relevance and reliability of various MT metrics, such as the old BLEU, METEOR, CHRF, along with newer neuralbased metrics which promise to capture semantics more effectively. We aim to uncover the inherent strengths and limitations of these metrics, consequently guiding the choice of appropriate metrics for specific MT contexts for future practitioners and researchers. In this holistic examination, we will also propose to analyze the interplay between model selection, evaluation metric, and translation quality. This thesis will provide a novel lens to understand the idiosyncrasies of various popular MT models and evaluation metrics, ultimately contributing to more effective and nuanced applications of MT. In sum, this exploration promises to furnish a new perspective on MT evaluation, honing our understanding of both the models’ and metrics’ evolutionary paths, and providing insights into their contextual performance on the DCEP corpus, creating a benchmark that can serve the broader MT community. The insights derived aim to significantly contribute to the latter. The reader can find all the code, used for the text pre/postprocessing and evaluation of the models and metrics at play along with other intermediate matters, published publicly in our GitHub repository.	ca
dc.format.extent	39 p.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/2445/214303
dc.language.iso	eng	ca
dc.rights	cc-by-nc-nd (c) Álvaro López Caro, 2023
dc.rights	codi: Apache (c) Álvaro López Caro, 2023
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.rights.uri	https://www.apache.org/licenses/LICENSE-2.0.txt	*
dc.source	Màster Oficial - Fonaments de la Ciència de Dades
dc.subject.classification	Traducció automàtica
dc.subject.classification	Lingüística computacional
dc.subject.classification	Tractament del llenguatge natural (Informàtica)
dc.subject.classification	Treballs de fi de màster
dc.subject.other	Machine translating
dc.subject.other	Computational linguistics
dc.subject.other	Natural language processing (Computer science)
dc.subject.other	Master's thesis
dc.title	Machine translation evaluation metrics benchmarking: from traditional MT to LLMs	ca
dc.type	info:eu-repo/semantics/masterThesis	ca

Fitxers

Paquet original

Mostrant 1 - 2 de 2

Nom:: tfm_lopez_caro_alvaro.pdf
Mida:: 1.29 MB
Format:: Adobe Portable Document Format
Descripció:: Memòria

Descarregar

Nom:: Machine-Translation-evaluation-metrics-benchmarking-main.zip
Mida:: 2.61 MB
Format:: ZIP file
Descripció:: Codi font

Descarregar

Col·leccions

Màster Oficial - Fonaments de la Ciència de Dades
Programari - Treballs de l'alumnat