Fixed titles, math rendering, and links on some pages

This commit is contained in:
Brandon Rozek 2021-07-26 09:13:20 -04:00
parent 9f096a8720
commit 330ace0de9
61 changed files with 303 additions and 115 deletions

View file

@ -1,4 +1,8 @@
# Chapter 2: Multi-armed Bandits
---
title: Chapter 2 - Multi-armed Bandits
showthedate: false
math: true
---
Reinforcement learning *evaluates* the actions taken rather than accepting $instructions$ of the correct actions. This creates the need for active exploration.
@ -141,4 +145,4 @@ This is an example of an *associative search* task, so called because it involve
If actions are allowed to affect the next situation as well as the reward, then we have the full reinforcement learning problem. This will be presented in the next chapter of the book with its ramifications appearing throughout the rest of the book.
![1536321791927](/home/rozek/Pictures/1536321791927.png)
![1536321791927](/home/rozek/Pictures/1536321791927.png)

View file

@ -1,4 +1,8 @@
# Chapter 4: Dynamic Programming
---
title: Chapter 4 - Dynamic Programming
showthedate: false
math: true
---
Dynamic programming refers to a collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a Markov decision process (MDP).

View file

@ -1,4 +1,7 @@
# Introduction to Reinforcement Learning Day 1
---
title: Introduction to Reinforcement Learning Day 1
showthedate: false
---
Recall that this course is based on the book --
@ -63,4 +66,4 @@ Most of the time in a reinforcement learning algorithm, we move greedily, select
Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment.
Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment.

View file

@ -1,4 +1,8 @@
# Chapter 5: Monte Carlo Methods
---
title: Chapter 5 - Monte Carlo Methods
showthedate: false
math: true
---
Monte Carlo methods do not assume complete knowledge of the environment. They require only *experience* which is a sample sequence of states, actions, and rewards from actual or simulated interaction with an environment.

View file

@ -1,4 +1,8 @@
# Chapter 3: Finite Markov Decision Processes
---
title: Chapter 3 - Finite Markov Decision Processes
showthedate: false
math: true
---
Markov Decision processes are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to trade-off immediate and delayed reward. Whereas in bandit problems we estimated the value of $q_*(a)$ of each action $a$, in MDPs we estimate the value of $q_*(s, a)$ of each action $a$ in state $s$.