Please use this identifier to cite or link to this item:
http://hdl.handle.net/2445/182251
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Puertas i Prats, Eloi | - |
dc.contributor.advisor | Sans Gispert, Eloi | - |
dc.contributor.author | de Juan Pulido, Xavier | - |
dc.date.accessioned | 2022-01-12T07:39:47Z | - |
dc.date.available | 2022-01-12T07:39:47Z | - |
dc.date.issued | 2021-06-20 | - |
dc.identifier.uri | http://hdl.handle.net/2445/182251 | - |
dc.description | Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2021, Director: Eloi Puertas i Prats i Eloi Sans Gispert | ca |
dc.description.abstract | [en] This report studies the Reinforcement Learning problem based on the Sonic the Hedgehog™ video game franchise. Fundamental concepts of RL are introduced in order to formalize the problem and present some state of the art Deep Reinforcement Learning (DRL) algorithms to solve it. It is also studied the core optimization problem from deep learning. In addition, some mathematical background is presented behind the optimization problem and behind the solutions based on Gradient Descent optimization algorithm. The aim is to empirically show that Adam is the best choice of optimizer for DRL methods in the Sonic benchmark. In the course of achieving it, first it has been explained the main elements of reinforcement learning and how this kind of problems can be formulated as Markov Decision Processes. Then it has been examined its combination with deep learning, also called Deep Reinforcement Learning, and different algorithms are provided to exemplify how they are combined. The second part of this work studies the mathematical background behind optimization in deep learning. This work also considers the optimization problem that appears when wanting to update the parameters of a neural network and a basic solution, the Gradient Descent optimization algorithm. Different variants of Gradient Descent have been looked at, along with the analysis of its challenges and the most common optimization algorithms that improve the basic Gradient Descent. During the experimentation phase, a set of Sonic levels has been used, splitted in train and test levels, to define each environment and common metrics to evaluate the experiments score and time. Each experiment consist in a DRL algorithm, between Joint PPO and Rainbow, and an optimization method between Adam, Adadelta, AdaGrad, Nesterov Accelerated Gradient and Momentum. Results show that Adam is the best choice of optimizer for both DRL algorithms, making the agents achieve better scores and learn to generalize between different levels. That empirically proves that Adam is the best optimizer for Joint PPO and Rainbow on the Sonic benchmark. | ca |
dc.format.extent | 71 p. | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | cat | ca |
dc.rights | memòria: cc-nc-nd (c) Xavier de Juan Pulido, 2021 | - |
dc.rights | codi: GPL (c) Xavier de Juan Pulido, 2021 | - |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ | - |
dc.rights.uri | http://www.gnu.org/licenses/gpl-3.0.ca.html | * |
dc.source | Treballs Finals de Grau (TFG) - Enginyeria Informàtica | - |
dc.subject.classification | Aprenentatge per reforç (Intel·ligència artificial) | ca |
dc.subject.classification | Optimització matemàtica | ca |
dc.subject.classification | Programari | ca |
dc.subject.classification | Treballs de fi de grau | ca |
dc.subject.classification | Algorismes computacionals | ca |
dc.subject.classification | Videojocs | ca |
dc.subject.other | Reinforcement learning | en |
dc.subject.other | Mathematical optimization | en |
dc.subject.other | Computer software | en |
dc.subject.other | Computer algorithms | en |
dc.subject.other | Video games | en |
dc.subject.other | Bachelor's theses | en |
dc.title | Estudi dels mètodes d'optimització al deep reinforcement learning aplicat als videojocs | ca |
dc.type | info:eu-repo/semantics/bachelorThesis | ca |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | ca |
Appears in Collections: | Programari - Treballs de l'alumnat Treballs Finals de Grau (TFG) - Matemàtiques Treballs Finals de Grau (TFG) - Enginyeria Informàtica |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
codi.zip | Codi font | 3.7 MB | zip | View/Open |
tfg_xavier_de_juan_pulido.pdf | Memòria | 1.33 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License