mirror of
https://github.com/Brandon-Rozek/website.git
synced 2025-12-07 19:10:23 +00:00
Fixed titles, math rendering, and links on some pages
This commit is contained in:
parent
9f096a8720
commit
330ace0de9
61 changed files with 303 additions and 115 deletions
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 3: Finite Markov Decision Processes
|
||||
---
|
||||
title: Chapter 3 - Finite Markov Decision Processes
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Markov Decision processes are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to trade-off immediate and delayed reward. Whereas in bandit problems we estimated the value of $q_*(a)$ of each action $a$, in MDPs we estimate the value of $q_*(s, a)$ of each action $a$ in state $s$.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue