Archived
1
0
Fork 0

Added links to the implementations

This commit is contained in:
Brandon Rozek 2019-01-22 10:30:19 -05:00
parent cc71007efc
commit 7937defd9e

View file

@ -10,7 +10,7 @@ https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
*The authors expanded upon the concept of a Q-network in reinforcement learning, by introducing a non-linear appromixation with neural networks. They were able to apply a convolution neural network to parse the raw pixels of gameplay in order to outperform human experts.*
My implementation:
My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/PoleBalanceKeras.ipynb
@ -20,7 +20,7 @@ https://arxiv.org/abs/1511.06581
*The authors introduced a way to generalize learning across actions in a DQN by modifying the network architecture. The dueling network represents two separate estimators, one for the state function and the other one for the state-dependent action advantage function.*
My implementation:
My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/DuelingPoleBalance.ipynb
@ -30,4 +30,4 @@ https://arxiv.org/abs/1511.05952
*Typically in the training of DQN networks, there exists an experience replay buffer where experiences are sampled after a certain time period to train the neural network. Before this paper, the sampling was done uniformly across all experiences. The authors expand upon that, allowing for more priority or increasing the likeliness of an experience being sampled if the TD error is high.*
My implementation:
My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/PrioReplayPoleBalanceKeras.ipynb