Commit graph

73 commits

Author SHA1 Message Date
1f7c6f10ab Used standard library definition 2020-04-14 14:21:31 -04:00
8fa4691511 PEP8 Conformance 2020-04-14 14:16:14 -04:00
Brandon Rozek
9b81188a77
Update LICENSE 2020-03-29 14:10:07 -04:00
Brandon Rozek
b12f54e3cd
Delete license.md 2020-03-29 14:07:29 -04:00
Brandon Rozek
112acf9085
Create LICENSE 2020-03-29 14:07:06 -04:00
Brandon Rozek
a667b3734b Merge branch 'master' of github.com:Brandon-Rozek/rltorch 2020-03-23 19:57:21 -04:00
Brandon Rozek
838062813a Better handling of demonstration data 2020-03-23 19:57:05 -04:00
4c6dc0a2ea Added network documentation 2020-03-20 20:16:29 -04:00
5e7de5bed7 Documentation for logging utility 2020-03-20 19:38:45 -04:00
a47f3f6037 Seed documentation 2020-03-20 19:38:35 -04:00
1cad98fcf9 Created documentation for memory module 2020-03-20 19:31:09 -04:00
711c2e8dd1 Cleaned up scheduler docs 2020-03-20 19:30:56 -04:00
720bb1b051 Documented scheduler module 2020-03-20 17:59:56 -04:00
ea62ccf389 Added templates for unit testing and sphinx documentation 2020-03-15 14:27:56 -04:00
Brandon Rozek
2e01bc16ea
Merge pull request #5 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.2
Bump tensorflow from 1.15.0 to 1.15.2
2020-03-15 13:16:03 -04:00
dependabot[bot]
c6ca2a4cfe
Bump tensorflow from 1.15.0 to 1.15.2
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.15.0 to 1.15.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v1.15.0...v1.15.2)

Signed-off-by: dependabot[bot] <support@github.com>
2020-01-28 22:34:13 +00:00
Brandon Rozek
557c0f689a
Merge pull request #4 from Brandon-Rozek/dependabot/pip/urllib3-1.24.2
Bump urllib3 from 1.24.1 to 1.24.2
2020-01-02 23:50:58 -05:00
dependabot[bot]
8323b1b073
Bump urllib3 from 1.24.1 to 1.24.2
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.24.1 to 1.24.2.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/master/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.24.1...1.24.2)

Signed-off-by: dependabot[bot] <support@github.com>
2020-01-03 04:48:59 +00:00
Brandon Rozek
0c7640fea1
Merge pull request #1 from Brandon-Rozek/dependabot/pip/werkzeug-0.15.3
Bump werkzeug from 0.14.1 to 0.15.3
2020-01-02 23:48:21 -05:00
Brandon Rozek
d8d1a0a5e9
Merge pull request #2 from Brandon-Rozek/dependabot/pip/pillow-6.2.0
Bump pillow from 5.4.1 to 6.2.0
2020-01-02 23:48:09 -05:00
Brandon Rozek
3c09867fb2
Merge pull request #3 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.0
Bump tensorflow from 1.12.0 to 1.15.0
2020-01-02 23:47:54 -05:00
dependabot[bot]
35f57a3f22
Bump tensorflow from 1.12.0 to 1.15.0
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.12.0 to 1.15.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v1.12.0...v1.15.0)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-16 21:27:03 +00:00
Brandon Rozek
3217c76a79 DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling 2019-11-17 19:50:49 -05:00
Brandon Rozek
23532fc372 Added a way to cap the number of demonstrations that are kept in the buffer 2019-11-17 18:29:12 -05:00
Brandon Rozek
038d406d0f Fixed errors with n-step returns 2019-11-13 22:56:27 -05:00
Brandon Rozek
ed62e148d5 Initial implementation of n-step loss 2019-11-11 10:24:40 -05:00
07c90a09f9 Fixed scoping error with Transitions 2019-11-04 12:09:09 -05:00
ad75539776 Implemented components necessary for Deep Q Learning from Demonstrations 2019-11-04 07:44:39 -05:00
Brandon Rozek
17391c7467 First draft of Deep Q Learning From Demonstrations 2019-10-31 20:54:52 -04:00
dependabot[bot]
6b2b6da9e6
Bump pillow from 5.4.1 to 6.2.0
Bumps [pillow](https://github.com/python-pillow/Pillow) from 5.4.1 to 6.2.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/5.4.1...6.2.0)

Signed-off-by: dependabot[bot] <support@github.com>
2019-10-22 22:14:02 +00:00
dependabot[bot]
324fa5c667
Bump werkzeug from 0.14.1 to 0.15.3
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 0.14.1 to 0.15.3.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/master/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/0.14.1...0.15.3)

Signed-off-by: dependabot[bot] <support@github.com>
2019-10-21 18:14:40 +00:00
559efa38b0 Corrected for numba deprecation
Enable the ability to render out scenes to play back data
2019-09-19 07:57:39 -04:00
a99ca66b4f Fixed multiprocessing with CUDA. Added entropy importance as a config option. 2019-09-18 07:26:32 -04:00
9d32a9edd1 Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch
# Conflicts:
#	rltorch/agents/QEPAgent.py
2019-09-13 20:00:13 -04:00
da83f1470c Some work on multiprocessing evolutionary strategies from last semester 2019-09-13 19:53:19 -04:00
7aa698c349 Added save and load functionality 2019-09-13 19:49:04 -04:00
dcf7cce30d Flush out print text so I can keep track of the rewards by SSHing in 2019-09-13 19:48:51 -04:00
912e3d42cb Added new OpenAI Baseline Wrappers 2019-09-13 19:48:24 -04:00
6d3a78cd20 Added parallel version of ES 2019-03-30 16:33:40 -04:00
9ad63a6921 Added license 2019-03-30 16:32:57 -04:00
b2f5220585 Made sure the reward_batch is float across different agents 2019-03-14 10:43:14 -04:00
cdfd3ab6b9 Playing around with QEP 2019-03-14 00:53:51 -04:00
8683b75ad9 Corrected gamma multiplication 2019-03-04 22:04:13 -05:00
190eb1f0c4 Correct discount_rewards function to only multiply with gamma throughout 2019-03-04 21:59:02 -05:00
11d99df977 Added improvements to the REINFORCE algorithm 2019-03-04 17:10:24 -05:00
a59f84b446 Cleaned up scripts, added more comments 2019-03-04 17:09:46 -05:00
e42f5bba1b Corrected A2C and PPO to train at the end of an episode 2019-03-01 21:04:13 -05:00
1958fc7c7e Corrected device when constructing fitness tensor 2019-02-28 14:41:34 -05:00
9740c40527 d.sample returns a tensor, so we stack them to not lose the device 2019-02-28 14:30:49 -05:00
714443192d Added entropy into QEP (1% of loss)
Made random numbers generated in ESNetwork happen in the same device
2019-02-28 12:17:35 -05:00