Lexicographic proximal policy optimization for ethical multi-agent learning

Torquet Luna, Núria

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/216855

Title:	Lexicographic proximal policy optimization for ethical multi-agent learning
Author:	Torquet Luna, Núria
Director/Tutor:	López Sánchez, Maite Rodríguez Soto, Manel
Keywords:	Aprenentatge automàtic Agents intel·ligents (Programes d'ordinador) Processos de Markov Aspectes morals Programari Treballs de fi de grau Machine learning Intelligent agents (Computer software) Markov processes Moral aspects Computer software Bachelor's theses
Issue Date:	4-Jun-2024
Abstract:	[en] This Bachelor’s Degree Final Project studies the integration of ethical principles into multi-objective, multi-agent reinforcement learning (MOMARL) through the implementation and evaluation of the Independent Lexicographic Proximal Policy Optimization (ILPPO) algorithm. In multi-agent reinforcement learning (MARL) the dynamic interactions between agents can make ethical learning particularly complex. ILPPO addresses these challenges by prioritizing ethical decisionmaking via a lexicographic ordering of multiple objectives, ensuring that the ethical objectives are met before addressing other individual objectives. We start by presenting the necessary background on a multi-objective Markov Decision Process (MOMDP), the Proximal Policy Optimization (PPO) algorithm and the lexicographic RL framework, which forms the basis for LPPO algorithm. Then, we extrapolate the three elements of this framework (MOMDP, PPO and LPPO) in the context of multi-agent Markov games, where each agent learns independently. Once we develop the Independent LPPO (ILPPO), we evaluate it in the Ethical Gathering Game, an environment where agents learn to behave in alignment with the moral value of beneficence. Our experiments demonstrate that ILPPO can learn optimal ethical policies aligned with ethical values, similar to the ones obtained with Independent PPO (IPPO). This study concludes that ILPPO provides a robust framework for embedding ethical considerations into MOMARL, offering new insights and paving the way for future research in more complex environments.
Note:	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Maite López Sánchez i Manel Rodríguez Soto
URI:	https://hdl.handle.net/2445/216855
Appears in Collections:	Treballs Finals de Grau (TFG) - Enginyeria Informàtica Programari - Treballs de l'alumnat Treballs Finals de Grau (TFG) - Matemàtiques

Files in This Item:

File	Description	Size	Format
tfg_torquet_luna_nuria.pdf	Memòria	3.82 MB	Adobe PDF	View/Open
codi.zip	Codi font	174.82 MB	zip	View/Open

Show full item record

This item is licensed under a Creative Commons License