|
1f7c6f10ab
|
Used standard library definition
|
2020-04-14 14:21:31 -04:00 |
|
|
8fa4691511
|
PEP8 Conformance
|
2020-04-14 14:16:14 -04:00 |
|
Brandon Rozek
|
9b81188a77
|
Update LICENSE
|
2020-03-29 14:10:07 -04:00 |
|
Brandon Rozek
|
b12f54e3cd
|
Delete license.md
|
2020-03-29 14:07:29 -04:00 |
|
Brandon Rozek
|
112acf9085
|
Create LICENSE
|
2020-03-29 14:07:06 -04:00 |
|
Brandon Rozek
|
a667b3734b
|
Merge branch 'master' of github.com:Brandon-Rozek/rltorch
|
2020-03-23 19:57:21 -04:00 |
|
Brandon Rozek
|
838062813a
|
Better handling of demonstration data
|
2020-03-23 19:57:05 -04:00 |
|
|
4c6dc0a2ea
|
Added network documentation
|
2020-03-20 20:16:29 -04:00 |
|
|
5e7de5bed7
|
Documentation for logging utility
|
2020-03-20 19:38:45 -04:00 |
|
|
a47f3f6037
|
Seed documentation
|
2020-03-20 19:38:35 -04:00 |
|
|
1cad98fcf9
|
Created documentation for memory module
|
2020-03-20 19:31:09 -04:00 |
|
|
711c2e8dd1
|
Cleaned up scheduler docs
|
2020-03-20 19:30:56 -04:00 |
|
|
720bb1b051
|
Documented scheduler module
|
2020-03-20 17:59:56 -04:00 |
|
|
ea62ccf389
|
Added templates for unit testing and sphinx documentation
|
2020-03-15 14:27:56 -04:00 |
|
Brandon Rozek
|
2e01bc16ea
|
Merge pull request #5 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.2
Bump tensorflow from 1.15.0 to 1.15.2
|
2020-03-15 13:16:03 -04:00 |
|
dependabot[bot]
|
c6ca2a4cfe
|
Bump tensorflow from 1.15.0 to 1.15.2
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.15.0 to 1.15.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v1.15.0...v1.15.2)
Signed-off-by: dependabot[bot] <support@github.com>
|
2020-01-28 22:34:13 +00:00 |
|
Brandon Rozek
|
557c0f689a
|
Merge pull request #4 from Brandon-Rozek/dependabot/pip/urllib3-1.24.2
Bump urllib3 from 1.24.1 to 1.24.2
|
2020-01-02 23:50:58 -05:00 |
|
dependabot[bot]
|
8323b1b073
|
Bump urllib3 from 1.24.1 to 1.24.2
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.24.1 to 1.24.2.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/master/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.24.1...1.24.2)
Signed-off-by: dependabot[bot] <support@github.com>
|
2020-01-03 04:48:59 +00:00 |
|
Brandon Rozek
|
0c7640fea1
|
Merge pull request #1 from Brandon-Rozek/dependabot/pip/werkzeug-0.15.3
Bump werkzeug from 0.14.1 to 0.15.3
|
2020-01-02 23:48:21 -05:00 |
|
Brandon Rozek
|
d8d1a0a5e9
|
Merge pull request #2 from Brandon-Rozek/dependabot/pip/pillow-6.2.0
Bump pillow from 5.4.1 to 6.2.0
|
2020-01-02 23:48:09 -05:00 |
|
Brandon Rozek
|
3c09867fb2
|
Merge pull request #3 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.0
Bump tensorflow from 1.12.0 to 1.15.0
|
2020-01-02 23:47:54 -05:00 |
|
dependabot[bot]
|
35f57a3f22
|
Bump tensorflow from 1.12.0 to 1.15.0
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.12.0 to 1.15.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v1.12.0...v1.15.0)
Signed-off-by: dependabot[bot] <support@github.com>
|
2019-12-16 21:27:03 +00:00 |
|
Brandon Rozek
|
3217c76a79
|
DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling
|
2019-11-17 19:50:49 -05:00 |
|
Brandon Rozek
|
23532fc372
|
Added a way to cap the number of demonstrations that are kept in the buffer
|
2019-11-17 18:29:12 -05:00 |
|
Brandon Rozek
|
038d406d0f
|
Fixed errors with n-step returns
|
2019-11-13 22:56:27 -05:00 |
|
Brandon Rozek
|
ed62e148d5
|
Initial implementation of n-step loss
|
2019-11-11 10:24:40 -05:00 |
|
|
07c90a09f9
|
Fixed scoping error with Transitions
|
2019-11-04 12:09:09 -05:00 |
|
|
ad75539776
|
Implemented components necessary for Deep Q Learning from Demonstrations
|
2019-11-04 07:44:39 -05:00 |
|
Brandon Rozek
|
17391c7467
|
First draft of Deep Q Learning From Demonstrations
|
2019-10-31 20:54:52 -04:00 |
|
dependabot[bot]
|
6b2b6da9e6
|
Bump pillow from 5.4.1 to 6.2.0
Bumps [pillow](https://github.com/python-pillow/Pillow) from 5.4.1 to 6.2.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/5.4.1...6.2.0)
Signed-off-by: dependabot[bot] <support@github.com>
|
2019-10-22 22:14:02 +00:00 |
|
dependabot[bot]
|
324fa5c667
|
Bump werkzeug from 0.14.1 to 0.15.3
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 0.14.1 to 0.15.3.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/master/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/0.14.1...0.15.3)
Signed-off-by: dependabot[bot] <support@github.com>
|
2019-10-21 18:14:40 +00:00 |
|
|
559efa38b0
|
Corrected for numba deprecation
Enable the ability to render out scenes to play back data
|
2019-09-19 07:57:39 -04:00 |
|
|
a99ca66b4f
|
Fixed multiprocessing with CUDA. Added entropy importance as a config option.
|
2019-09-18 07:26:32 -04:00 |
|
|
9d32a9edd1
|
Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch
# Conflicts:
# rltorch/agents/QEPAgent.py
|
2019-09-13 20:00:13 -04:00 |
|
|
da83f1470c
|
Some work on multiprocessing evolutionary strategies from last semester
|
2019-09-13 19:53:19 -04:00 |
|
|
7aa698c349
|
Added save and load functionality
|
2019-09-13 19:49:04 -04:00 |
|
|
dcf7cce30d
|
Flush out print text so I can keep track of the rewards by SSHing in
|
2019-09-13 19:48:51 -04:00 |
|
|
912e3d42cb
|
Added new OpenAI Baseline Wrappers
|
2019-09-13 19:48:24 -04:00 |
|
|
6d3a78cd20
|
Added parallel version of ES
|
2019-03-30 16:33:40 -04:00 |
|
|
9ad63a6921
|
Added license
|
2019-03-30 16:32:57 -04:00 |
|
|
b2f5220585
|
Made sure the reward_batch is float across different agents
|
2019-03-14 10:43:14 -04:00 |
|
|
cdfd3ab6b9
|
Playing around with QEP
|
2019-03-14 00:53:51 -04:00 |
|
|
8683b75ad9
|
Corrected gamma multiplication
|
2019-03-04 22:04:13 -05:00 |
|
|
190eb1f0c4
|
Correct discount_rewards function to only multiply with gamma throughout
|
2019-03-04 21:59:02 -05:00 |
|
|
11d99df977
|
Added improvements to the REINFORCE algorithm
|
2019-03-04 17:10:24 -05:00 |
|
|
a59f84b446
|
Cleaned up scripts, added more comments
|
2019-03-04 17:09:46 -05:00 |
|
|
e42f5bba1b
|
Corrected A2C and PPO to train at the end of an episode
|
2019-03-01 21:04:13 -05:00 |
|
|
1958fc7c7e
|
Corrected device when constructing fitness tensor
|
2019-02-28 14:41:34 -05:00 |
|
|
9740c40527
|
d.sample returns a tensor, so we stack them to not lose the device
|
2019-02-28 14:30:49 -05:00 |
|
|
714443192d
|
Added entropy into QEP (1% of loss)
Made random numbers generated in ESNetwork happen in the same device
|
2019-02-28 12:17:35 -05:00 |
|