rltorch

Author	SHA1	Message	Date
Brandon Rozek	3217c76a79	DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling	2019-11-17 19:50:49 -05:00
Brandon Rozek	23532fc372	Added a way to cap the number of demonstrations that are kept in the buffer	2019-11-17 18:29:12 -05:00
Brandon Rozek	038d406d0f	Fixed errors with n-step returns	2019-11-13 22:56:27 -05:00
Brandon Rozek	ed62e148d5	Initial implementation of n-step loss	2019-11-11 10:24:40 -05:00
Brandon Rozek	07c90a09f9	Fixed scoping error with Transitions	2019-11-04 12:09:09 -05:00
Brandon Rozek	ad75539776	Implemented components necessary for Deep Q Learning from Demonstrations	2019-11-04 07:44:39 -05:00
Brandon Rozek	17391c7467	First draft of Deep Q Learning From Demonstrations	2019-10-31 20:54:52 -04:00
Brandon Rozek	559efa38b0	Corrected for numba deprecation Enable the ability to render out scenes to play back data	2019-09-19 07:57:39 -04:00
Brandon Rozek	a99ca66b4f	Fixed multiprocessing with CUDA. Added entropy importance as a config option.	2019-09-18 07:26:32 -04:00
Brandon Rozek	9d32a9edd1	Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch # Conflicts: # rltorch/agents/QEPAgent.py	2019-09-13 20:00:13 -04:00
Brandon Rozek	da83f1470c	Some work on multiprocessing evolutionary strategies from last semester	2019-09-13 19:53:19 -04:00
Brandon Rozek	7aa698c349	Added save and load functionality	2019-09-13 19:49:04 -04:00
Brandon Rozek	dcf7cce30d	Flush out print text so I can keep track of the rewards by SSHing in	2019-09-13 19:48:51 -04:00
Brandon Rozek	912e3d42cb	Added new OpenAI Baseline Wrappers	2019-09-13 19:48:24 -04:00
Brandon Rozek	6d3a78cd20	Added parallel version of ES	2019-03-30 16:33:40 -04:00
Brandon Rozek	9ad63a6921	Added license	2019-03-30 16:32:57 -04:00
Brandon Rozek	b2f5220585	Made sure the reward_batch is float across different agents	2019-03-14 10:43:14 -04:00
Brandon Rozek	cdfd3ab6b9	Playing around with QEP	2019-03-14 00:53:51 -04:00
Brandon Rozek	8683b75ad9	Corrected gamma multiplication	2019-03-04 22:04:13 -05:00
Brandon Rozek	190eb1f0c4	Correct discount_rewards function to only multiply with gamma throughout	2019-03-04 21:59:02 -05:00
Brandon Rozek	11d99df977	Added improvements to the REINFORCE algorithm	2019-03-04 17:10:24 -05:00
Brandon Rozek	a59f84b446	Cleaned up scripts, added more comments	2019-03-04 17:09:46 -05:00
Brandon Rozek	e42f5bba1b	Corrected A2C and PPO to train at the end of an episode	2019-03-01 21:04:13 -05:00
Brandon Rozek	1958fc7c7e	Corrected device when constructing fitness tensor	2019-02-28 14:41:34 -05:00
Brandon Rozek	9740c40527	d.sample returns a tensor, so we stack them to not lose the device	2019-02-28 14:30:49 -05:00
Brandon Rozek	714443192d	Added entropy into QEP (1% of loss) Made random numbers generated in ESNetwork happen in the same device	2019-02-28 12:17:35 -05:00
Brandon Rozek	76a044ace9	Added Evolutionary Strategies Network and added more example scripts	2019-02-27 09:52:28 -05:00
Brandon Rozek	26084d4c7c	Added PPOAgent and A2CAgent to the agents submodule. Also made some small changes to how memories are queried	2019-02-19 20:54:30 -05:00
Brandon Rozek	21b820b401	Implemented REINFORCE into the library	2019-02-16 20:30:27 -05:00
Brandon Rozek	14ba64d525	Added a single process environment runner. Also added an example for using such class.	2019-02-16 18:15:45 -05:00
Brandon Rozek	736e73a1f7	Took away explicit deleting since the next_state variable gets used in another slot	2019-02-14 22:01:13 -05:00
Brandon Rozek	2caf869fd6	Added numba as a dependency and decorated the Prioiritzed Replay function	2019-02-14 21:42:31 -05:00
Brandon Rozek	19a859a4f6	If memory or logger does not exist, then don't create those shared memory structures	2019-02-14 21:06:44 -05:00
Brandon Rozek	460d4c05c1	Fixed EnvironmentRun to be properly multiprocess. Fixed the prioirity of bad states to be the smallest [TODO] Make EnvironmentEpisode properly multiprocess	2019-02-13 23:47:37 -05:00
Brandon Rozek	115543d201	Fixed parallel implementation of getting experiences by using a queue	2019-02-13 00:36:23 -05:00
Brandon Rozek	5094ed53af	Updated examples to have new features	2019-02-11 10:23:11 -05:00
Brandon Rozek	fe97a9b78d	Corrected typo	2019-02-11 00:00:34 -05:00
Brandon Rozek	be637664e7	Added collections import	2019-02-10 23:59:29 -05:00
Brandon Rozek	b2ab2ee132	Implemented epsilon as a scheduler	2019-02-10 23:56:21 -05:00
Brandon Rozek	04e54cddc2	Sends importance weights into correct device for prioiritized replay	2019-02-10 23:16:44 -05:00
Brandon Rozek	013d40a4f9	Implemented Schedulers and Prioritized Replay	2019-02-10 23:11:53 -05:00
Brandon Rozek	8c78f47c0c	Added missing parameter	2019-02-03 21:57:27 -05:00
Brandon Rozek	f1a52cf77b	Added missing dependency	2019-02-03 21:55:08 -05:00
Brandon Rozek	0a6f1e73f3	Fixed another mismatched device error	2019-02-03 00:53:13 -05:00
Brandon Rozek	e62385b574	Fixed not having self.	2019-02-03 00:51:24 -05:00
Brandon Rozek	03455accc8	Attempting to see if this fixes the mismatched devices error	2019-02-03 00:49:47 -05:00
Brandon Rozek	9cd3625fd3	Made sure everything went to their appropriate devices	2019-02-03 00:45:14 -05:00
Brandon Rozek	39643f04e1	Simplified in place guassian code	2019-02-03 00:31:35 -05:00
Brandon Rozek	a03abe2bb1	Initial Commit	2019-01-31 23:34:32 -05:00

49 commits