rltorch

Author	SHA1	Message	Date
Brandon Rozek	cb87105305	Work towards simplifying ReplayMemory	2020-04-14 16:02:59 -04:00
Brandon Rozek	c6172f309d	Made Logger global	2020-04-14 15:24:48 -04:00
Brandon Rozek	1f7c6f10ab	Used standard library definition	2020-04-14 14:21:31 -04:00
Brandon Rozek	8fa4691511	PEP8 Conformance	2020-04-14 14:16:14 -04:00
Brandon Rozek	9b81188a77	Update LICENSE	2020-03-29 14:10:07 -04:00
Brandon Rozek	b12f54e3cd	Delete license.md	2020-03-29 14:07:29 -04:00
Brandon Rozek	112acf9085	Create LICENSE	2020-03-29 14:07:06 -04:00
Brandon Rozek	a667b3734b	Merge branch 'master' of github.com:Brandon-Rozek/rltorch	2020-03-23 19:57:21 -04:00
Brandon Rozek	838062813a	Better handling of demonstration data	2020-03-23 19:57:05 -04:00
Brandon Rozek	4c6dc0a2ea	Added network documentation	2020-03-20 20:16:29 -04:00
Brandon Rozek	5e7de5bed7	Documentation for logging utility	2020-03-20 19:38:45 -04:00
Brandon Rozek	a47f3f6037	Seed documentation	2020-03-20 19:38:35 -04:00
Brandon Rozek	1cad98fcf9	Created documentation for memory module	2020-03-20 19:31:09 -04:00
Brandon Rozek	711c2e8dd1	Cleaned up scheduler docs	2020-03-20 19:30:56 -04:00
Brandon Rozek	720bb1b051	Documented scheduler module	2020-03-20 17:59:56 -04:00
Brandon Rozek	ea62ccf389	Added templates for unit testing and sphinx documentation	2020-03-15 14:27:56 -04:00
Brandon Rozek	2e01bc16ea	Merge pull request #5 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.2 Bump tensorflow from 1.15.0 to 1.15.2	2020-03-15 13:16:03 -04:00
dependabot[bot]	c6ca2a4cfe	Bump tensorflow from 1.15.0 to 1.15.2 Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.15.0 to 1.15.2. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v1.15.0...v1.15.2) Signed-off-by: dependabot[bot] <support@github.com>	2020-01-28 22:34:13 +00:00
Brandon Rozek	557c0f689a	Merge pull request #4 from Brandon-Rozek/dependabot/pip/urllib3-1.24.2 Bump urllib3 from 1.24.1 to 1.24.2	2020-01-02 23:50:58 -05:00
dependabot[bot]	8323b1b073	Bump urllib3 from 1.24.1 to 1.24.2 Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.24.1 to 1.24.2. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/master/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/1.24.1...1.24.2) Signed-off-by: dependabot[bot] <support@github.com>	2020-01-03 04:48:59 +00:00
Brandon Rozek	0c7640fea1	Merge pull request #1 from Brandon-Rozek/dependabot/pip/werkzeug-0.15.3 Bump werkzeug from 0.14.1 to 0.15.3	2020-01-02 23:48:21 -05:00
Brandon Rozek	d8d1a0a5e9	Merge pull request #2 from Brandon-Rozek/dependabot/pip/pillow-6.2.0 Bump pillow from 5.4.1 to 6.2.0	2020-01-02 23:48:09 -05:00
Brandon Rozek	3c09867fb2	Merge pull request #3 from Brandon-Rozek/dependabot/pip/tensorflow-1.15.0 Bump tensorflow from 1.12.0 to 1.15.0	2020-01-02 23:47:54 -05:00
dependabot[bot]	35f57a3f22	Bump tensorflow from 1.12.0 to 1.15.0 Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.12.0 to 1.15.0. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v1.12.0...v1.15.0) Signed-off-by: dependabot[bot] <support@github.com>	2019-12-16 21:27:03 +00:00
Brandon Rozek	3217c76a79	DQfD memory was adjusted to actually update the weights in the priority trees, fixing a bug in the sampling	2019-11-17 19:50:49 -05:00
Brandon Rozek	23532fc372	Added a way to cap the number of demonstrations that are kept in the buffer	2019-11-17 18:29:12 -05:00
Brandon Rozek	038d406d0f	Fixed errors with n-step returns	2019-11-13 22:56:27 -05:00
Brandon Rozek	ed62e148d5	Initial implementation of n-step loss	2019-11-11 10:24:40 -05:00
Brandon Rozek	07c90a09f9	Fixed scoping error with Transitions	2019-11-04 12:09:09 -05:00
Brandon Rozek	ad75539776	Implemented components necessary for Deep Q Learning from Demonstrations	2019-11-04 07:44:39 -05:00
Brandon Rozek	17391c7467	First draft of Deep Q Learning From Demonstrations	2019-10-31 20:54:52 -04:00
dependabot[bot]	6b2b6da9e6	Bump pillow from 5.4.1 to 6.2.0 Bumps [pillow](https://github.com/python-pillow/Pillow) from 5.4.1 to 6.2.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/5.4.1...6.2.0) Signed-off-by: dependabot[bot] <support@github.com>	2019-10-22 22:14:02 +00:00
dependabot[bot]	324fa5c667	Bump werkzeug from 0.14.1 to 0.15.3 Bumps [werkzeug](https://github.com/pallets/werkzeug) from 0.14.1 to 0.15.3. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/master/CHANGES.rst) - [Commits](https://github.com/pallets/werkzeug/compare/0.14.1...0.15.3) Signed-off-by: dependabot[bot] <support@github.com>	2019-10-21 18:14:40 +00:00
Brandon Rozek	559efa38b0	Corrected for numba deprecation Enable the ability to render out scenes to play back data	2019-09-19 07:57:39 -04:00
Brandon Rozek	a99ca66b4f	Fixed multiprocessing with CUDA. Added entropy importance as a config option.	2019-09-18 07:26:32 -04:00
Brandon Rozek	9d32a9edd1	Merge branch 'master' of https://github.com/Brandon-Rozek/rltorch # Conflicts: # rltorch/agents/QEPAgent.py	2019-09-13 20:00:13 -04:00
Brandon Rozek	da83f1470c	Some work on multiprocessing evolutionary strategies from last semester	2019-09-13 19:53:19 -04:00
Brandon Rozek	7aa698c349	Added save and load functionality	2019-09-13 19:49:04 -04:00
Brandon Rozek	dcf7cce30d	Flush out print text so I can keep track of the rewards by SSHing in	2019-09-13 19:48:51 -04:00
Brandon Rozek	912e3d42cb	Added new OpenAI Baseline Wrappers	2019-09-13 19:48:24 -04:00
Brandon Rozek	6d3a78cd20	Added parallel version of ES	2019-03-30 16:33:40 -04:00
Brandon Rozek	9ad63a6921	Added license	2019-03-30 16:32:57 -04:00
Brandon Rozek	b2f5220585	Made sure the reward_batch is float across different agents	2019-03-14 10:43:14 -04:00
Brandon Rozek	cdfd3ab6b9	Playing around with QEP	2019-03-14 00:53:51 -04:00
Brandon Rozek	8683b75ad9	Corrected gamma multiplication	2019-03-04 22:04:13 -05:00
Brandon Rozek	190eb1f0c4	Correct discount_rewards function to only multiply with gamma throughout	2019-03-04 21:59:02 -05:00
Brandon Rozek	11d99df977	Added improvements to the REINFORCE algorithm	2019-03-04 17:10:24 -05:00
Brandon Rozek	a59f84b446	Cleaned up scripts, added more comments	2019-03-04 17:09:46 -05:00
Brandon Rozek	e42f5bba1b	Corrected A2C and PPO to train at the end of an episode	2019-03-01 21:04:13 -05:00
Brandon Rozek	1958fc7c7e	Corrected device when constructing fitness tensor	2019-02-28 14:41:34 -05:00

1 2

75 commits