Brandon Rozek
|
d8d1a0a5e9
|
Merge pull request #2 from Brandon-Rozek/dependabot/pip/pillow-6.2.0
Bump pillow from 5.4.1 to 6.2.0
|
2020-01-02 23:48:09 -05:00 |
|
Brandon Rozek
|
3c09867fb2
|
Merge pull request #3 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.0
Bump tensorflow from 1.12.0 to 1.15.0
|
2020-01-02 23:47:54 -05:00 |
|
dependabot[bot]
|
35f57a3f22
|
Bump tensorflow from 1.12.0 to 1.15.0
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.12.0 to 1.15.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v1.12.0...v1.15.0)
Signed-off-by: dependabot[bot] <support@github.com>
|
2019-12-16 21:27:03 +00:00 |
|
Brandon Rozek
|
3217c76a79
|
DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling
|
2019-11-17 19:50:49 -05:00 |
|
Brandon Rozek
|
23532fc372
|
Added a way to cap the number of demonstrations that are kept in the buffer
|
2019-11-17 18:29:12 -05:00 |
|
Brandon Rozek
|
038d406d0f
|
Fixed errors with n-step returns
|
2019-11-13 22:56:27 -05:00 |
|
Brandon Rozek
|
ed62e148d5
|
Initial implementation of n-step loss
|
2019-11-11 10:24:40 -05:00 |
|
|
07c90a09f9
|
Fixed scoping error with Transitions
|
2019-11-04 12:09:09 -05:00 |
|
|
ad75539776
|
Implemented components necessary for Deep Q Learning from Demonstrations
|
2019-11-04 07:44:39 -05:00 |
|
Brandon Rozek
|
17391c7467
|
First draft of Deep Q Learning From Demonstrations
|
2019-10-31 20:54:52 -04:00 |
|
dependabot[bot]
|
6b2b6da9e6
|
Bump pillow from 5.4.1 to 6.2.0
Bumps [pillow](https://github.com/python-pillow/Pillow) from 5.4.1 to 6.2.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/5.4.1...6.2.0)
Signed-off-by: dependabot[bot] <support@github.com>
|
2019-10-22 22:14:02 +00:00 |
|
|
559efa38b0
|
Corrected for numba deprecation
Enable the ability to render out scenes to play back data
|
2019-09-19 07:57:39 -04:00 |
|
|
a99ca66b4f
|
Fixed multiprocessing with CUDA. Added entropy importance as a config option.
|
2019-09-18 07:26:32 -04:00 |
|
|
9d32a9edd1
|
Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch
# Conflicts:
# rltorch/agents/QEPAgent.py
|
2019-09-13 20:00:13 -04:00 |
|
|
da83f1470c
|
Some work on multiprocessing evolutionary strategies from last semester
|
2019-09-13 19:53:19 -04:00 |
|
|
7aa698c349
|
Added save and load functionality
|
2019-09-13 19:49:04 -04:00 |
|
|
dcf7cce30d
|
Flush out print text so I can keep track of the rewards by SSHing in
|
2019-09-13 19:48:51 -04:00 |
|
|
912e3d42cb
|
Added new OpenAI Baseline Wrappers
|
2019-09-13 19:48:24 -04:00 |
|
|
6d3a78cd20
|
Added parallel version of ES
|
2019-03-30 16:33:40 -04:00 |
|
|
9ad63a6921
|
Added license
|
2019-03-30 16:32:57 -04:00 |
|
|
b2f5220585
|
Made sure the reward_batch is float across different agents
|
2019-03-14 10:43:14 -04:00 |
|
|
cdfd3ab6b9
|
Playing around with QEP
|
2019-03-14 00:53:51 -04:00 |
|
|
8683b75ad9
|
Corrected gamma multiplication
|
2019-03-04 22:04:13 -05:00 |
|
|
190eb1f0c4
|
Correct discount_rewards function to only multiply with gamma throughout
|
2019-03-04 21:59:02 -05:00 |
|
|
11d99df977
|
Added improvements to the REINFORCE algorithm
|
2019-03-04 17:10:24 -05:00 |
|
|
a59f84b446
|
Cleaned up scripts, added more comments
|
2019-03-04 17:09:46 -05:00 |
|
|
e42f5bba1b
|
Corrected A2C and PPO to train at the end of an episode
|
2019-03-01 21:04:13 -05:00 |
|
|
1958fc7c7e
|
Corrected device when constructing fitness tensor
|
2019-02-28 14:41:34 -05:00 |
|
|
9740c40527
|
d.sample returns a tensor, so we stack them to not lose the device
|
2019-02-28 14:30:49 -05:00 |
|
|
714443192d
|
Added entropy into QEP (1% of loss)
Made random numbers generated in ESNetwork happen in the same device
|
2019-02-28 12:17:35 -05:00 |
|
|
76a044ace9
|
Added Evolutionary Strategies Network and added more example scripts
|
2019-02-27 09:52:28 -05:00 |
|
|
26084d4c7c
|
Added PPOAgent and A2CAgent to the agents submodule.
Also made some small changes to how memories are queried
|
2019-02-19 20:54:30 -05:00 |
|
|
21b820b401
|
Implemented REINFORCE into the library
|
2019-02-16 20:30:27 -05:00 |
|
|
14ba64d525
|
Added a single process environment runner. Also added an example for using such class.
|
2019-02-16 18:15:45 -05:00 |
|
|
736e73a1f7
|
Took away explicit deleting since the next_state variable gets used in another slot
|
2019-02-14 22:01:13 -05:00 |
|
|
2caf869fd6
|
Added numba as a dependency and decorated the Prioiritzed Replay function
|
2019-02-14 21:42:31 -05:00 |
|
|
19a859a4f6
|
If memory or logger does not exist, then don't create those shared memory structures
|
2019-02-14 21:06:44 -05:00 |
|
|
460d4c05c1
|
Fixed EnvironmentRun to be properly multiprocess.
Fixed the prioirity of bad states to be the smallest
[TODO] Make EnvironmentEpisode properly multiprocess
|
2019-02-13 23:47:37 -05:00 |
|
|
115543d201
|
Fixed parallel implementation of getting experiences by using a queue
|
2019-02-13 00:36:23 -05:00 |
|
|
5094ed53af
|
Updated examples to have new features
|
2019-02-11 10:23:11 -05:00 |
|
|
fe97a9b78d
|
Corrected typo
|
2019-02-11 00:00:34 -05:00 |
|
|
be637664e7
|
Added collections import
|
2019-02-10 23:59:29 -05:00 |
|
|
b2ab2ee132
|
Implemented epsilon as a scheduler
|
2019-02-10 23:56:21 -05:00 |
|
|
04e54cddc2
|
Sends importance weights into correct device for prioiritized replay
|
2019-02-10 23:16:44 -05:00 |
|
|
013d40a4f9
|
Implemented Schedulers and Prioritized Replay
|
2019-02-10 23:11:53 -05:00 |
|
|
8c78f47c0c
|
Added missing parameter
|
2019-02-03 21:57:27 -05:00 |
|
|
f1a52cf77b
|
Added missing dependency
|
2019-02-03 21:55:08 -05:00 |
|
|
0a6f1e73f3
|
Fixed another mismatched device error
|
2019-02-03 00:53:13 -05:00 |
|
|
e62385b574
|
Fixed not having self.
|
2019-02-03 00:51:24 -05:00 |
|
|
03455accc8
|
Attempting to see if this fixes the mismatched devices error
|
2019-02-03 00:49:47 -05:00 |
|