Commit graph

50 commits

Author SHA1 Message Date
dependabot[bot]
35f57a3f22
Bump tensorflow from 1.12.0 to 1.15.0
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.12.0 to 1.15.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v1.12.0...v1.15.0)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-16 21:27:03 +00:00
Brandon Rozek
3217c76a79 DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling 2019-11-17 19:50:49 -05:00
Brandon Rozek
23532fc372 Added a way to cap the number of demonstrations that are kept in the buffer 2019-11-17 18:29:12 -05:00
Brandon Rozek
038d406d0f Fixed errors with n-step returns 2019-11-13 22:56:27 -05:00
Brandon Rozek
ed62e148d5 Initial implementation of n-step loss 2019-11-11 10:24:40 -05:00
07c90a09f9 Fixed scoping error with Transitions 2019-11-04 12:09:09 -05:00
ad75539776 Implemented components necessary for Deep Q Learning from Demonstrations 2019-11-04 07:44:39 -05:00
Brandon Rozek
17391c7467 First draft of Deep Q Learning From Demonstrations 2019-10-31 20:54:52 -04:00
559efa38b0 Corrected for numba deprecation
Enable the ability to render out scenes to play back data
2019-09-19 07:57:39 -04:00
a99ca66b4f Fixed multiprocessing with CUDA. Added entropy importance as a config option. 2019-09-18 07:26:32 -04:00
9d32a9edd1 Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch
# Conflicts:
#	rltorch/agents/QEPAgent.py
2019-09-13 20:00:13 -04:00
da83f1470c Some work on multiprocessing evolutionary strategies from last semester 2019-09-13 19:53:19 -04:00
7aa698c349 Added save and load functionality 2019-09-13 19:49:04 -04:00
dcf7cce30d Flush out print text so I can keep track of the rewards by SSHing in 2019-09-13 19:48:51 -04:00
912e3d42cb Added new OpenAI Baseline Wrappers 2019-09-13 19:48:24 -04:00
6d3a78cd20 Added parallel version of ES 2019-03-30 16:33:40 -04:00
9ad63a6921 Added license 2019-03-30 16:32:57 -04:00
b2f5220585 Made sure the reward_batch is float across different agents 2019-03-14 10:43:14 -04:00
cdfd3ab6b9 Playing around with QEP 2019-03-14 00:53:51 -04:00
8683b75ad9 Corrected gamma multiplication 2019-03-04 22:04:13 -05:00
190eb1f0c4 Correct discount_rewards function to only multiply with gamma throughout 2019-03-04 21:59:02 -05:00
11d99df977 Added improvements to the REINFORCE algorithm 2019-03-04 17:10:24 -05:00
a59f84b446 Cleaned up scripts, added more comments 2019-03-04 17:09:46 -05:00
e42f5bba1b Corrected A2C and PPO to train at the end of an episode 2019-03-01 21:04:13 -05:00
1958fc7c7e Corrected device when constructing fitness tensor 2019-02-28 14:41:34 -05:00
9740c40527 d.sample returns a tensor, so we stack them to not lose the device 2019-02-28 14:30:49 -05:00
714443192d Added entropy into QEP (1% of loss)
Made random numbers generated in ESNetwork happen in the same device
2019-02-28 12:17:35 -05:00
76a044ace9 Added Evolutionary Strategies Network and added more example scripts 2019-02-27 09:52:28 -05:00
26084d4c7c Added PPOAgent and A2CAgent to the agents submodule.
Also made some small changes to how memories are queried
2019-02-19 20:54:30 -05:00
21b820b401 Implemented REINFORCE into the library 2019-02-16 20:30:27 -05:00
14ba64d525 Added a single process environment runner. Also added an example for using such class. 2019-02-16 18:15:45 -05:00
736e73a1f7 Took away explicit deleting since the next_state variable gets used in another slot 2019-02-14 22:01:13 -05:00
2caf869fd6 Added numba as a dependency and decorated the Prioiritzed Replay function 2019-02-14 21:42:31 -05:00
19a859a4f6 If memory or logger does not exist, then don't create those shared memory structures 2019-02-14 21:06:44 -05:00
460d4c05c1 Fixed EnvironmentRun to be properly multiprocess.
Fixed the prioirity of bad states to be the smallest
[TODO] Make EnvironmentEpisode properly multiprocess
2019-02-13 23:47:37 -05:00
115543d201 Fixed parallel implementation of getting experiences by using a queue 2019-02-13 00:36:23 -05:00
5094ed53af Updated examples to have new features 2019-02-11 10:23:11 -05:00
fe97a9b78d Corrected typo 2019-02-11 00:00:34 -05:00
be637664e7 Added collections import 2019-02-10 23:59:29 -05:00
b2ab2ee132 Implemented epsilon as a scheduler 2019-02-10 23:56:21 -05:00
04e54cddc2 Sends importance weights into correct device for prioiritized replay 2019-02-10 23:16:44 -05:00
013d40a4f9 Implemented Schedulers and Prioritized Replay 2019-02-10 23:11:53 -05:00
8c78f47c0c Added missing parameter 2019-02-03 21:57:27 -05:00
f1a52cf77b Added missing dependency 2019-02-03 21:55:08 -05:00
0a6f1e73f3 Fixed another mismatched device error 2019-02-03 00:53:13 -05:00
e62385b574 Fixed not having self. 2019-02-03 00:51:24 -05:00
03455accc8 Attempting to see if this fixes the mismatched devices error 2019-02-03 00:49:47 -05:00
9cd3625fd3 Made sure everything went to their appropriate devices 2019-02-03 00:45:14 -05:00
39643f04e1 Simplified in place guassian code 2019-02-03 00:31:35 -05:00
a03abe2bb1 Initial Commit 2019-01-31 23:34:32 -05:00