mirror of
https://github.com/Brandon-Rozek/website.git
synced 2025-10-09 06:21:13 +00:00
Fixed titles, math rendering, and links on some pages
This commit is contained in:
parent
9f096a8720
commit
330ace0de9
61 changed files with 303 additions and 115 deletions
|
@ -1,4 +1,8 @@
|
|||
# Chapter 2: Multi-armed Bandits
|
||||
---
|
||||
title: Chapter 2 - Multi-armed Bandits
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Reinforcement learning *evaluates* the actions taken rather than accepting $instructions$ of the correct actions. This creates the need for active exploration.
|
||||
|
||||
|
@ -141,4 +145,4 @@ This is an example of an *associative search* task, so called because it involve
|
|||
|
||||
If actions are allowed to affect the next situation as well as the reward, then we have the full reinforcement learning problem. This will be presented in the next chapter of the book with its ramifications appearing throughout the rest of the book.
|
||||
|
||||

|
||||

|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 4: Dynamic Programming
|
||||
---
|
||||
title: Chapter 4 - Dynamic Programming
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Dynamic programming refers to a collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a Markov decision process (MDP).
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Introduction to Reinforcement Learning Day 1
|
||||
---
|
||||
title: Introduction to Reinforcement Learning Day 1
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
Recall that this course is based on the book --
|
||||
|
||||
|
@ -63,4 +66,4 @@ Most of the time in a reinforcement learning algorithm, we move greedily, select
|
|||
|
||||
|
||||
|
||||
Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment.
|
||||
Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment.
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 5: Monte Carlo Methods
|
||||
---
|
||||
title: Chapter 5 - Monte Carlo Methods
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Monte Carlo methods do not assume complete knowledge of the environment. They require only *experience* which is a sample sequence of states, actions, and rewards from actual or simulated interaction with an environment.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 3: Finite Markov Decision Processes
|
||||
---
|
||||
title: Chapter 3 - Finite Markov Decision Processes
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Markov Decision processes are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to trade-off immediate and delayed reward. Whereas in bandit problems we estimated the value of $q_*(a)$ of each action $a$, in MDPs we estimate the value of $q_*(s, a)$ of each action $a$ in state $s$.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue