mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-12-23 16:41:17 +00:00
46 lines
1.6 KiB
Markdown
46 lines
1.6 KiB
Markdown
|
---
|
||
|
Title: Reinforcement Learning
|
||
|
Description: The study of optimally mapping situations to actions
|
||
|
---
|
||
|
|
||
|
# Reinforcement Learning
|
||
|
Reinforcement learning is the art of analyzing situations and mapping them to actions in order to maximize a numerical reward signal.
|
||
|
|
||
|
In this independent study, I as well as Dr. Stephen Davies, will explore the Reinforcement Learning problem and its subproblems. We will go over the bandit problem, markov decision processes, and discover how best to translate a problem in order to **make decisions**.
|
||
|
|
||
|
I have provided a list of topics that I wish to explore in a [syllabus](syllabus)
|
||
|
|
||
|
## Readings
|
||
|
|
||
|
In order to spend more time learning, I decided to follow a textbook this time.
|
||
|
|
||
|
Reinforcement Learning: An Introduction
|
||
|
|
||
|
By Richard S. Sutton and Andrew G. Barto
|
||
|
|
||
|
|
||
|
[Reading Schedule](readings)
|
||
|
|
||
|
|
||
|
## Notes
|
||
|
|
||
|
The notes for this course, is going to be an extreemly summarized version of the textbook. There will also be notes on whatever side tangents Dr. Davies and I explore.
|
||
|
|
||
|
[Notes page](notes)
|
||
|
|
||
|
I wrote a small little quirky/funny report describing the bandit problem. Great for learning about the common considerations for Reinforcement Learning problems.
|
||
|
|
||
|
[The Bandit Report](/files/research/TheBanditReport.pdf)
|
||
|
|
||
|
## Code
|
||
|
|
||
|
Code will occasionally be written to solidify the learning material and to act as aids for more exploration.
|
||
|
|
||
|
[Github Link](https://github.com/brandon-rozek/ReinforcementLearning)
|
||
|
|
||
|
Specifically, if you want to see agents I've created to solve some OpenAI environments, take a look at this specific folder in the Github Repository
|
||
|
|
||
|
[Github Link](https://github.com/Brandon-Rozek/ReinforcementLearning/tree/master/agents)
|
||
|
|
||
|
|