Brandon Rozek
|
3217c76a79
|
DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling
|
2019-11-17 19:50:49 -05:00 |
|
Brandon Rozek
|
23532fc372
|
Added a way to cap the number of demonstrations that are kept in the buffer
|
2019-11-17 18:29:12 -05:00 |
|
Brandon Rozek
|
038d406d0f
|
Fixed errors with n-step returns
|
2019-11-13 22:56:27 -05:00 |
|
Brandon Rozek
|
ed62e148d5
|
Initial implementation of n-step loss
|
2019-11-11 10:24:40 -05:00 |
|
|
07c90a09f9
|
Fixed scoping error with Transitions
|
2019-11-04 12:09:09 -05:00 |
|
|
ad75539776
|
Implemented components necessary for Deep Q Learning from Demonstrations
|
2019-11-04 07:44:39 -05:00 |
|
Brandon Rozek
|
17391c7467
|
First draft of Deep Q Learning From Demonstrations
|
2019-10-31 20:54:52 -04:00 |
|
|
559efa38b0
|
Corrected for numba deprecation
Enable the ability to render out scenes to play back data
|
2019-09-19 07:57:39 -04:00 |
|
|
a99ca66b4f
|
Fixed multiprocessing with CUDA. Added entropy importance as a config option.
|
2019-09-18 07:26:32 -04:00 |
|
|
9d32a9edd1
|
Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch
# Conflicts:
# rltorch/agents/QEPAgent.py
|
2019-09-13 20:00:13 -04:00 |
|
|
da83f1470c
|
Some work on multiprocessing evolutionary strategies from last semester
|
2019-09-13 19:53:19 -04:00 |
|
|
7aa698c349
|
Added save and load functionality
|
2019-09-13 19:49:04 -04:00 |
|
|
dcf7cce30d
|
Flush out print text so I can keep track of the rewards by SSHing in
|
2019-09-13 19:48:51 -04:00 |
|
|
912e3d42cb
|
Added new OpenAI Baseline Wrappers
|
2019-09-13 19:48:24 -04:00 |
|
|
6d3a78cd20
|
Added parallel version of ES
|
2019-03-30 16:33:40 -04:00 |
|
|
9ad63a6921
|
Added license
|
2019-03-30 16:32:57 -04:00 |
|
|
b2f5220585
|
Made sure the reward_batch is float across different agents
|
2019-03-14 10:43:14 -04:00 |
|
|
cdfd3ab6b9
|
Playing around with QEP
|
2019-03-14 00:53:51 -04:00 |
|
|
8683b75ad9
|
Corrected gamma multiplication
|
2019-03-04 22:04:13 -05:00 |
|
|
190eb1f0c4
|
Correct discount_rewards function to only multiply with gamma throughout
|
2019-03-04 21:59:02 -05:00 |
|
|
11d99df977
|
Added improvements to the REINFORCE algorithm
|
2019-03-04 17:10:24 -05:00 |
|
|
a59f84b446
|
Cleaned up scripts, added more comments
|
2019-03-04 17:09:46 -05:00 |
|
|
e42f5bba1b
|
Corrected A2C and PPO to train at the end of an episode
|
2019-03-01 21:04:13 -05:00 |
|
|
1958fc7c7e
|
Corrected device when constructing fitness tensor
|
2019-02-28 14:41:34 -05:00 |
|
|
9740c40527
|
d.sample returns a tensor, so we stack them to not lose the device
|
2019-02-28 14:30:49 -05:00 |
|
|
714443192d
|
Added entropy into QEP (1% of loss)
Made random numbers generated in ESNetwork happen in the same device
|
2019-02-28 12:17:35 -05:00 |
|
|
76a044ace9
|
Added Evolutionary Strategies Network and added more example scripts
|
2019-02-27 09:52:28 -05:00 |
|
|
26084d4c7c
|
Added PPOAgent and A2CAgent to the agents submodule.
Also made some small changes to how memories are queried
|
2019-02-19 20:54:30 -05:00 |
|
|
21b820b401
|
Implemented REINFORCE into the library
|
2019-02-16 20:30:27 -05:00 |
|
|
14ba64d525
|
Added a single process environment runner. Also added an example for using such class.
|
2019-02-16 18:15:45 -05:00 |
|
|
736e73a1f7
|
Took away explicit deleting since the next_state variable gets used in another slot
|
2019-02-14 22:01:13 -05:00 |
|
|
2caf869fd6
|
Added numba as a dependency and decorated the Prioiritzed Replay function
|
2019-02-14 21:42:31 -05:00 |
|
|
19a859a4f6
|
If memory or logger does not exist, then don't create those shared memory structures
|
2019-02-14 21:06:44 -05:00 |
|
|
460d4c05c1
|
Fixed EnvironmentRun to be properly multiprocess.
Fixed the prioirity of bad states to be the smallest
[TODO] Make EnvironmentEpisode properly multiprocess
|
2019-02-13 23:47:37 -05:00 |
|
|
115543d201
|
Fixed parallel implementation of getting experiences by using a queue
|
2019-02-13 00:36:23 -05:00 |
|
|
5094ed53af
|
Updated examples to have new features
|
2019-02-11 10:23:11 -05:00 |
|
|
fe97a9b78d
|
Corrected typo
|
2019-02-11 00:00:34 -05:00 |
|
|
be637664e7
|
Added collections import
|
2019-02-10 23:59:29 -05:00 |
|
|
b2ab2ee132
|
Implemented epsilon as a scheduler
|
2019-02-10 23:56:21 -05:00 |
|
|
04e54cddc2
|
Sends importance weights into correct device for prioiritized replay
|
2019-02-10 23:16:44 -05:00 |
|
|
013d40a4f9
|
Implemented Schedulers and Prioritized Replay
|
2019-02-10 23:11:53 -05:00 |
|
|
8c78f47c0c
|
Added missing parameter
|
2019-02-03 21:57:27 -05:00 |
|
|
f1a52cf77b
|
Added missing dependency
|
2019-02-03 21:55:08 -05:00 |
|
|
0a6f1e73f3
|
Fixed another mismatched device error
|
2019-02-03 00:53:13 -05:00 |
|
|
e62385b574
|
Fixed not having self.
|
2019-02-03 00:51:24 -05:00 |
|
|
03455accc8
|
Attempting to see if this fixes the mismatched devices error
|
2019-02-03 00:49:47 -05:00 |
|
|
9cd3625fd3
|
Made sure everything went to their appropriate devices
|
2019-02-03 00:45:14 -05:00 |
|
|
39643f04e1
|
Simplified in place guassian code
|
2019-02-03 00:31:35 -05:00 |
|
|
a03abe2bb1
|
Initial Commit
|
2019-01-31 23:34:32 -05:00 |
|