Machine translation evaluation metrics benchmarking: from traditional MT to LLMs

dc.contributor.advisorVitrià i Marca, Jordi
dc.contributor.authorLópez Caro, Álvaro
dc.date.accessioned2024-07-04T07:57:56Z
dc.date.available2024-07-04T07:57:56Z
dc.date.issued2023-06-30
dc.descriptionTreballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2022-2023. Tutor: Jordi Vitrià i Marcaca
dc.description.abstractThis thesis endeavors to cast a spotlight on the evolution and applicability of machine translation (MT) evaluation metrics and models, mainly contrasting statistical methods against the more contemporary neural-based ones, where we also give special attention to the exciting modern Large Language Models (LLMs). MT, a significant area in Natural Language Processing (NLP), has seen a vast metamorphosis over the years, bringing into focus the critical need for thorough exploration of these evolving systems. Our research is anchored on the Digital Corpus of the European Parliament (DCEP), a complex and multilingual corpus that makes it an ideal testbed to benchmark MT models given its comprehensive and diversified linguistic data. Through the use of this extensive corpus, we aim to present a comprehensive benchmarking of various selected MT models, encapsulating not just their evolution but also their performance dynamics across different tasks and contexts. A vital facet of our study includes evaluating the relevance and reliability of various MT metrics, such as the old BLEU, METEOR, CHRF, along with newer neuralbased metrics which promise to capture semantics more effectively. We aim to uncover the inherent strengths and limitations of these metrics, consequently guiding the choice of appropriate metrics for specific MT contexts for future practitioners and researchers. In this holistic examination, we will also propose to analyze the interplay between model selection, evaluation metric, and translation quality. This thesis will provide a novel lens to understand the idiosyncrasies of various popular MT models and evaluation metrics, ultimately contributing to more effective and nuanced applications of MT. In sum, this exploration promises to furnish a new perspective on MT evaluation, honing our understanding of both the models’ and metrics’ evolutionary paths, and providing insights into their contextual performance on the DCEP corpus, creating a benchmark that can serve the broader MT community. The insights derived aim to significantly contribute to the latter. The reader can find all the code, used for the text pre/postprocessing and evaluation of the models and metrics at play along with other intermediate matters, published publicly in our GitHub repository.ca
dc.format.extent39 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/214303
dc.language.isoengca
dc.rightscc-by-nc-nd (c) Álvaro López Caro, 2023
dc.rightscodi: Apache (c) Álvaro López Caro, 2023
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.rights.urihttps://www.apache.org/licenses/LICENSE-2.0.txt*
dc.sourceMàster Oficial - Fonaments de la Ciència de Dades
dc.subject.classificationTraducció automàtica
dc.subject.classificationLingüística computacional
dc.subject.classificationTractament del llenguatge natural (Informàtica)
dc.subject.classificationTreballs de fi de màster
dc.subject.otherMachine translating
dc.subject.otherComputational linguistics
dc.subject.otherNatural language processing (Computer science)
dc.subject.otherMaster's thesis
dc.titleMachine translation evaluation metrics benchmarking: from traditional MT to LLMsca
dc.typeinfo:eu-repo/semantics/masterThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
tfm_lopez_caro_alvaro.pdf
Mida:
1.29 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
Machine-Translation-evaluation-metrics-benchmarking-main.zip
Mida:
2.61 MB
Format:
ZIP file
Descripció:
Codi font