Added links to the implementations
This commit is contained in:
parent
cc71007efc
commit
7937defd9e
1 changed files with 3 additions and 3 deletions
|
@ -10,7 +10,7 @@ https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
|
||||||
|
|
||||||
*The authors expanded upon the concept of a Q-network in reinforcement learning, by introducing a non-linear appromixation with neural networks. They were able to apply a convolution neural network to parse the raw pixels of gameplay in order to outperform human experts.*
|
*The authors expanded upon the concept of a Q-network in reinforcement learning, by introducing a non-linear appromixation with neural networks. They were able to apply a convolution neural network to parse the raw pixels of gameplay in order to outperform human experts.*
|
||||||
|
|
||||||
My implementation:
|
My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/PoleBalanceKeras.ipynb
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -20,7 +20,7 @@ https://arxiv.org/abs/1511.06581
|
||||||
|
|
||||||
*The authors introduced a way to generalize learning across actions in a DQN by modifying the network architecture. The dueling network represents two separate estimators, one for the state function and the other one for the state-dependent action advantage function.*
|
*The authors introduced a way to generalize learning across actions in a DQN by modifying the network architecture. The dueling network represents two separate estimators, one for the state function and the other one for the state-dependent action advantage function.*
|
||||||
|
|
||||||
My implementation:
|
My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/DuelingPoleBalance.ipynb
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -30,4 +30,4 @@ https://arxiv.org/abs/1511.05952
|
||||||
|
|
||||||
*Typically in the training of DQN networks, there exists an experience replay buffer where experiences are sampled after a certain time period to train the neural network. Before this paper, the sampling was done uniformly across all experiences. The authors expand upon that, allowing for more priority or increasing the likeliness of an experience being sampled if the TD error is high.*
|
*Typically in the training of DQN networks, there exists an experience replay buffer where experiences are sampled after a certain time period to train the neural network. Before this paper, the sampling was done uniformly across all experiences. The authors expand upon that, allowing for more priority or increasing the likeliness of an experience being sampled if the TD error is high.*
|
||||||
|
|
||||||
My implementation:
|
My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/PrioReplayPoleBalanceKeras.ipynb
|
Reference in a new issue