|
190eb1f0c4
|
Correct discount_rewards function to only multiply with gamma throughout
|
2019-03-04 21:59:02 -05:00 |
|
|
11d99df977
|
Added improvements to the REINFORCE algorithm
|
2019-03-04 17:10:24 -05:00 |
|
|
a59f84b446
|
Cleaned up scripts, added more comments
|
2019-03-04 17:09:46 -05:00 |
|
|
e42f5bba1b
|
Corrected A2C and PPO to train at the end of an episode
|
2019-03-01 21:04:13 -05:00 |
|
|
1958fc7c7e
|
Corrected device when constructing fitness tensor
|
2019-02-28 14:41:34 -05:00 |
|
|
9740c40527
|
d.sample returns a tensor, so we stack them to not lose the device
|
2019-02-28 14:30:49 -05:00 |
|
|
714443192d
|
Added entropy into QEP (1% of loss)
Made random numbers generated in ESNetwork happen in the same device
|
2019-02-28 12:17:35 -05:00 |
|
|
76a044ace9
|
Added Evolutionary Strategies Network and added more example scripts
|
2019-02-27 09:52:28 -05:00 |
|
|
26084d4c7c
|
Added PPOAgent and A2CAgent to the agents submodule.
Also made some small changes to how memories are queried
|
2019-02-19 20:54:30 -05:00 |
|
|
21b820b401
|
Implemented REINFORCE into the library
|
2019-02-16 20:30:27 -05:00 |
|
|
14ba64d525
|
Added a single process environment runner. Also added an example for using such class.
|
2019-02-16 18:15:45 -05:00 |
|
|
736e73a1f7
|
Took away explicit deleting since the next_state variable gets used in another slot
|
2019-02-14 22:01:13 -05:00 |
|
|
2caf869fd6
|
Added numba as a dependency and decorated the Prioiritzed Replay function
|
2019-02-14 21:42:31 -05:00 |
|
|
19a859a4f6
|
If memory or logger does not exist, then don't create those shared memory structures
|
2019-02-14 21:06:44 -05:00 |
|
|
460d4c05c1
|
Fixed EnvironmentRun to be properly multiprocess.
Fixed the prioirity of bad states to be the smallest
[TODO] Make EnvironmentEpisode properly multiprocess
|
2019-02-13 23:47:37 -05:00 |
|
|
115543d201
|
Fixed parallel implementation of getting experiences by using a queue
|
2019-02-13 00:36:23 -05:00 |
|
|
5094ed53af
|
Updated examples to have new features
|
2019-02-11 10:23:11 -05:00 |
|
|
fe97a9b78d
|
Corrected typo
|
2019-02-11 00:00:34 -05:00 |
|
|
be637664e7
|
Added collections import
|
2019-02-10 23:59:29 -05:00 |
|
|
b2ab2ee132
|
Implemented epsilon as a scheduler
|
2019-02-10 23:56:21 -05:00 |
|
|
04e54cddc2
|
Sends importance weights into correct device for prioiritized replay
|
2019-02-10 23:16:44 -05:00 |
|
|
013d40a4f9
|
Implemented Schedulers and Prioritized Replay
|
2019-02-10 23:11:53 -05:00 |
|
|
8c78f47c0c
|
Added missing parameter
|
2019-02-03 21:57:27 -05:00 |
|
|
f1a52cf77b
|
Added missing dependency
|
2019-02-03 21:55:08 -05:00 |
|
|
0a6f1e73f3
|
Fixed another mismatched device error
|
2019-02-03 00:53:13 -05:00 |
|
|
e62385b574
|
Fixed not having self.
|
2019-02-03 00:51:24 -05:00 |
|
|
03455accc8
|
Attempting to see if this fixes the mismatched devices error
|
2019-02-03 00:49:47 -05:00 |
|
|
9cd3625fd3
|
Made sure everything went to their appropriate devices
|
2019-02-03 00:45:14 -05:00 |
|
|
39643f04e1
|
Simplified in place guassian code
|
2019-02-03 00:31:35 -05:00 |
|
|
a03abe2bb1
|
Initial Commit
|
2019-01-31 23:34:32 -05:00 |
|