Lexicographic proximal policy optimization for ethical multi-agent learning

Torquet Luna, Núria

Lexicographic proximal policy optimization for ethical multi-agent learning

dc.contributor.advisor	López Sánchez, Maite
dc.contributor.advisor	Rodríguez Soto, Manel
dc.contributor.author	Torquet Luna, Núria
dc.date.accessioned	2024-12-02T08:02:00Z
dc.date.available	2024-12-02T08:02:00Z
dc.date.issued	2024-06-04
dc.description	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2024, Director: Maite López Sánchez i Manel Rodríguez Soto	ca
dc.description.abstract	[en] This Bachelor’s Degree Final Project studies the integration of ethical principles into multi-objective, multi-agent reinforcement learning (MOMARL) through the implementation and evaluation of the Independent Lexicographic Proximal Policy Optimization (ILPPO) algorithm. In multi-agent reinforcement learning (MARL) the dynamic interactions between agents can make ethical learning particularly complex. ILPPO addresses these challenges by prioritizing ethical decisionmaking via a lexicographic ordering of multiple objectives, ensuring that the ethical objectives are met before addressing other individual objectives. We start by presenting the necessary background on a multi-objective Markov Decision Process (MOMDP), the Proximal Policy Optimization (PPO) algorithm and the lexicographic RL framework, which forms the basis for LPPO algorithm. Then, we extrapolate the three elements of this framework (MOMDP, PPO and LPPO) in the context of multi-agent Markov games, where each agent learns independently. Once we develop the Independent LPPO (ILPPO), we evaluate it in the Ethical Gathering Game, an environment where agents learn to behave in alignment with the moral value of beneficence. Our experiments demonstrate that ILPPO can learn optimal ethical policies aligned with ethical values, similar to the ones obtained with Independent PPO (IPPO). This study concludes that ILPPO provides a robust framework for embedding ethical considerations into MOMARL, offering new insights and paving the way for future research in more complex environments.	ca
dc.format.extent	73 p.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/2445/216855
dc.language.iso	eng	ca
dc.rights	memòria: cc-nc-nd (c) Núria Torquet Luna, 2024
dc.rights	codi: MIT (c) Núria Torquet Luna, 2024
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.uri	https://opensource.org/license/mit	*
dc.source	Treballs Finals de Grau (TFG) - Enginyeria Informàtica
dc.subject.classification	Aprenentatge automàtic	ca
dc.subject.classification	Agents intel·ligents (Programes d'ordinador)	ca
dc.subject.classification	Processos de Markov	ca
dc.subject.classification	Aspectes morals	ca
dc.subject.classification	Programari	ca
dc.subject.classification	Treballs de fi de grau	ca
dc.subject.other	Machine learning	en
dc.subject.other	Intelligent agents (Computer software)	en
dc.subject.other	Markov processes	en
dc.subject.other	Moral aspects	en
dc.subject.other	Computer software	en
dc.subject.other	Bachelor's theses	en
dc.title	Lexicographic proximal policy optimization for ethical multi-agent learning	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca

Fitxers

Paquet original

Mostrant 1 - 2 de 2

Nom:: tfg_torquet_luna_nuria.pdf
Mida:: 3.73 MB
Format:: Adobe Portable Document Format
Descripció:: Memòria

Descarregar

Nom:: codi.zip
Mida:: 170.73 MB
Format:: ZIP file
Descripció:: Codi font

Descarregar

Col·leccions

Treballs Finals de Grau (TFG) - Enginyeria Informàtica
Programari - Treballs de l'alumnat
Treballs Finals de Grau (TFG) - Matemàtiques