Q-learning in collaborative multiagent systems

González Trastoy, Alfred

Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/124087

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	López Sánchez, Maite	-
dc.contributor.author	González Trastoy, Alfred	-
dc.date.accessioned	2018-08-02T08:53:56Z	-
dc.date.available	2018-08-02T08:53:56Z	-
dc.date.issued	2018-02	-
dc.identifier.uri	http://hdl.handle.net/2445/124087	-
dc.description	Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2018, Director: Maite López Sánchez	ca
dc.description.abstract	Q-learning is one of the most widely used reinforcement learning techniques. It is very effective for learning an optimal policy in any finite Markov decision process (MDP). Collaborative multiagent systems, though, are a challenge for self-interested agent implementation, as higher utility can be achieved via collaboration. To evaluate the Q-learning efficiency in collaborative multiagent systems, we will use a simplified version of the Malmo Collaborative AI Challenge (MCAC). It was designed by Microsoft and consists of a game where 2 players can collaborate to catch the pig (high reward) or leave the game (low reward). Each action costs 1, so knowing when to leave and when to chase the pig is key for achieving high scores. Two main problems are faced in the challenge: uncertainty of the other agent behaviour and a limited learning time. We propose solutions to both problems using a simplified MCAC environment, a stateaction abstraction and an agent type modelling. We have implemented an agent that is able to identify the other player behaviour (whether it is collaborating or not) and can learn an optimal policy against each type of player. Results show that Q-learning is an efficient and effective technique to solve collaborative multiagent systems.	ca
dc.format.extent	26 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	ca
dc.rights	memòria: cc-by-nc-sa (c) Alfred González Trastoy, 2018	-
dc.rights	codi: GPL (c) Alfred González Trastoy, 2018	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.rights.uri	http://www.gnu.org/licenses/gpl-3.0.ca.html	*
dc.source	Treballs Finals de Grau (TFG) - Enginyeria Informàtica	-
dc.subject.classification	Aprenentatge automàtic	ca
dc.subject.classification	Intel·ligència artificial	ca
dc.subject.classification	Programari	ca
dc.subject.classification	Treballs de fi de grau	ca
dc.subject.classification	Aprenentatge per reforç (Intel·ligència artificial)	ca
dc.subject.classification	Processos de Markov	ca
dc.subject.other	Machine learning	en
dc.subject.other	Artificial intelligence	en
dc.subject.other	Computer software	en
dc.subject.other	Bachelor's theses	en
dc.subject.other	Reinforcement learning	en
dc.subject.other	Markov processes	en
dc.title	Q-learning in collaborative multiagent systems	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca
Appears in Collections:	Programari - Treballs de l'alumnat Treballs Finals de Grau (TFG) - Enginyeria Informàtica

Files in This Item:

File	Description	Size	Format
codi_font.zip	Codi font	656.72 kB	zip	View/Open
memoria.pdf	Memòria	1.33 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License