From 7937defd9e03125ccac6dfe8145a96b3b0be4d14 Mon Sep 17 00:00:00 2001 From: Brandon Rozek Date: Tue, 22 Jan 2019 10:30:19 -0500 Subject: [PATCH] Added links to the implementations --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 1e3f8ce..755caf6 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf *The authors expanded upon the concept of a Q-network in reinforcement learning, by introducing a non-linear appromixation with neural networks. They were able to apply a convolution neural network to parse the raw pixels of gameplay in order to outperform human experts.* -My implementation: +My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/PoleBalanceKeras.ipynb @@ -20,7 +20,7 @@ https://arxiv.org/abs/1511.06581 *The authors introduced a way to generalize learning across actions in a DQN by modifying the network architecture. The dueling network represents two separate estimators, one for the state function and the other one for the state-dependent action advantage function.* -My implementation: +My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/DuelingPoleBalance.ipynb @@ -30,4 +30,4 @@ https://arxiv.org/abs/1511.05952 *Typically in the training of DQN networks, there exists an experience replay buffer where experiences are sampled after a certain time period to train the neural network. Before this paper, the sampling was done uniformly across all experiences. The authors expand upon that, allowing for more priority or increasing the likeliness of an experience being sampled if the TD error is high.* -My implementation: \ No newline at end of file +My implementation: https://github.com/Brandon-Rozek/DeepRL/blob/master/PrioReplayPoleBalanceKeras.ipynb \ No newline at end of file