Archived

Further explorations into Reinforcement learning with Deep Learning Networks

This repository has been archived on 2023-11-10. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

Find a file

Brandon Rozek cc71007efc Added initial explorations		2019-01-22 10:28:46 -05:00
DuelingPoleBalance.ipynb	Added initial explorations	2019-01-22 10:28:46 -05:00
PoleBalanceKeras.ipynb	Added initial explorations	2019-01-22 10:28:46 -05:00
PrioReplayPoleBalanceKeras.ipynb	Added initial explorations	2019-01-22 10:28:46 -05:00
README.md	Added initial explorations	2019-01-22 10:28:46 -05:00

README.md

Deep Reinforcement Learning

This is a walk through my journey of deep reinforcement learning. This will highlight the papers I've read and implemented.

Deep Q-Networks (DQN)

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. Playing Atari with Deep Reinforcement Learning.

https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

The authors expanded upon the concept of a Q-network in reinforcement learning, by introducing a non-linear appromixation with neural networks. They were able to apply a convolution neural network to parse the raw pixels of gameplay in order to outperform human experts.

My implementation:

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas. Dueling Network Architectures for Deep Reinforcement Learning.

https://arxiv.org/abs/1511.06581

The authors introduced a way to generalize learning across actions in a DQN by modifying the network architecture. The dueling network represents two separate estimators, one for the state function and the other one for the state-dependent action advantage function.

My implementation:

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver. Prioritized Experience Replay.

https://arxiv.org/abs/1511.05952

Typically in the training of DQN networks, there exists an experience replay buffer where experiences are sampled after a certain time period to train the neural network. Before this paper, the sampling was done uniformly across all experiences. The authors expand upon that, allowing for more priority or increasing the likeliness of an experience being sampled if the TD error is high.

My implementation: