Instilling moral value alignment by means of multi-objective reinforcement learning

dc.contributor.authorRodriguez Soto, Manel
dc.contributor.authorSerramia, Marc
dc.contributor.authorLópez Sánchez, Maite
dc.contributor.authorRodríguez-Aguilar, Juan A. (Juan Antonio)
dc.date.accessioned2023-02-01T09:10:35Z
dc.date.available2023-02-01T09:10:35Z
dc.date.issued2022-01-24
dc.date.updated2023-02-01T09:10:35Z
dc.description.abstractAI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent's individual and ethical objectives. The second step consists in designing an environment wherein an agent learns to behave ethically while pursuing its individual objective. We leverage on our theoretical results to introduce an algorithm that automates our two-step approach. In the cases where value-aligned behaviour is possible, our algorithm produces a learning environment for the agent wherein it will learn a value-aligned behaviour.
dc.format.extent17 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec715848
dc.identifier.issn1388-1957
dc.identifier.urihttps://hdl.handle.net/2445/192920
dc.language.isoeng
dc.publisherSpringer
dc.relation.isformatofReproducció del document publicat a: https://doi.org/10.1007/s10676-022-09635-0
dc.relation.ispartofEthics And Information Technology, 2022, vol. 24
dc.relation.urihttps://doi.org/10.1007/s10676-022-09635-0
dc.rightscc by (c) Manel Rodríguez Soto et al., 2022
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.sourceArticles publicats en revistes (Matemàtiques i Informàtica)
dc.subject.classificationIntel·ligència artificial
dc.subject.classificationAprenentatge per reforç (Intel·ligència artificial)
dc.subject.classificationÈtica
dc.subject.classificationAspectes morals
dc.subject.otherArtificial intelligence
dc.subject.otherReinforcement learning
dc.subject.otherEthics
dc.subject.otherMoral aspects
dc.titleInstilling moral value alignment by means of multi-objective reinforcement learning
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
715848.pdf
Mida:
1.82 MB
Format:
Adobe Portable Document Format