Lexicographic proximal policy optimization for ethical multi-agent learning

dc.contributor.advisorLópez Sánchez, Maite
dc.contributor.advisorRodríguez Soto, Manel
dc.contributor.authorTorquet Luna, Núria
dc.date.accessioned2024-12-02T08:02:00Z
dc.date.available2024-12-02T08:02:00Z
dc.date.issued2024-06-04
dc.descriptionTreballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Maite López Sánchez i Manel Rodríguez Sotoca
dc.description.abstract[en] This Bachelor’s Degree Final Project studies the integration of ethical principles into multi-objective, multi-agent reinforcement learning (MOMARL) through the implementation and evaluation of the Independent Lexicographic Proximal Policy Optimization (ILPPO) algorithm. In multi-agent reinforcement learning (MARL) the dynamic interactions between agents can make ethical learning particularly complex. ILPPO addresses these challenges by prioritizing ethical decisionmaking via a lexicographic ordering of multiple objectives, ensuring that the ethical objectives are met before addressing other individual objectives. We start by presenting the necessary background on a multi-objective Markov Decision Process (MOMDP), the Proximal Policy Optimization (PPO) algorithm and the lexicographic RL framework, which forms the basis for LPPO algorithm. Then, we extrapolate the three elements of this framework (MOMDP, PPO and LPPO) in the context of multi-agent Markov games, where each agent learns independently. Once we develop the Independent LPPO (ILPPO), we evaluate it in the Ethical Gathering Game, an environment where agents learn to behave in alignment with the moral value of beneficence. Our experiments demonstrate that ILPPO can learn optimal ethical policies aligned with ethical values, similar to the ones obtained with Independent PPO (IPPO). This study concludes that ILPPO provides a robust framework for embedding ethical considerations into MOMARL, offering new insights and paving the way for future research in more complex environments.ca
dc.format.extent73 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/216855
dc.language.isoengca
dc.rightsmemòria: cc-nc-nd (c) Núria Torquet Luna, 2024
dc.rightscodi: MIT (c) Núria Torquet Luna, 2024
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.urihttps://opensource.org/license/mit*
dc.sourceTreballs Finals de Grau (TFG) - Enginyeria Informàtica
dc.subject.classificationAprenentatge automàticca
dc.subject.classificationAgents intel·ligents (Programes d'ordinador)ca
dc.subject.classificationProcessos de Markovca
dc.subject.classificationAspectes moralsca
dc.subject.classificationProgramarica
dc.subject.classificationTreballs de fi de grauca
dc.subject.otherMachine learningen
dc.subject.otherIntelligent agents (Computer software)en
dc.subject.otherMarkov processesen
dc.subject.otherMoral aspectsen
dc.subject.otherComputer softwareen
dc.subject.otherBachelor's thesesen
dc.titleLexicographic proximal policy optimization for ethical multi-agent learningca
dc.typeinfo:eu-repo/semantics/bachelorThesisca

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
tfg_torquet_luna_nuria.pdf
Mida:
3.73 MB
Format:
Adobe Portable Document Format
Descripció:
Memòria
Carregant...
Miniatura
Nom:
codi.zip
Mida:
170.73 MB
Format:
ZIP file
Descripció:
Codi font