Machine translation evaluation metrics benchmarking: from traditional MT to LLMs

López Caro, Álvaro

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/214303

Title:	Machine translation evaluation metrics benchmarking: from traditional MT to LLMs
Author:	López Caro, Álvaro
Director/Tutor:	Vitrià i Marca, Jordi
Keywords:	Traducció automàtica Lingüística computacional Tractament del llenguatge natural (Informàtica) Treballs de fi de màster Machine translating Computational linguistics Natural language processing (Computer science) Master's thesis
Issue Date:	30-Jun-2023
Abstract:	This thesis endeavors to cast a spotlight on the evolution and applicability of machine translation (MT) evaluation metrics and models, mainly contrasting statistical methods against the more contemporary neural-based ones, where we also give special attention to the exciting modern Large Language Models (LLMs). MT, a significant area in Natural Language Processing (NLP), has seen a vast metamorphosis over the years, bringing into focus the critical need for thorough exploration of these evolving systems. Our research is anchored on the Digital Corpus of the European Parliament (DCEP), a complex and multilingual corpus that makes it an ideal testbed to benchmark MT models given its comprehensive and diversified linguistic data. Through the use of this extensive corpus, we aim to present a comprehensive benchmarking of various selected MT models, encapsulating not just their evolution but also their performance dynamics across different tasks and contexts. A vital facet of our study includes evaluating the relevance and reliability of various MT metrics, such as the old BLEU, METEOR, CHRF, along with newer neuralbased metrics which promise to capture semantics more effectively. We aim to uncover the inherent strengths and limitations of these metrics, consequently guiding the choice of appropriate metrics for specific MT contexts for future practitioners and researchers. In this holistic examination, we will also propose to analyze the interplay between model selection, evaluation metric, and translation quality. This thesis will provide a novel lens to understand the idiosyncrasies of various popular MT models and evaluation metrics, ultimately contributing to more effective and nuanced applications of MT. In sum, this exploration promises to furnish a new perspective on MT evaluation, honing our understanding of both the models’ and metrics’ evolutionary paths, and providing insights into their contextual performance on the DCEP corpus, creating a benchmark that can serve the broader MT community. The insights derived aim to significantly contribute to the latter. The reader can find all the code, used for the text pre/postprocessing and evaluation of the models and metrics at play along with other intermediate matters, published publicly in our GitHub repository.
Note:	Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2022-2023. Tutor: Jordi Vitrià i Marca
URI:	https://hdl.handle.net/2445/214303
Appears in Collections:	Programari - Treballs de l'alumnat Màster Oficial - Fonaments de la Ciència de Dades

Files in This Item:

File	Description	Size	Format
tfm_lopez_caro_alvaro.pdf	Memòria	1.32 MB	Adobe PDF	View/Open
Machine-Translation-evaluation-metrics-benchmarking-main.zip	Codi font	2.67 MB	zip	View/Open

Show full item record

This item is licensed under a Creative Commons License