mirror of
https://github.com/Brandon-Rozek/website.git
synced 2025-10-10 15:01:15 +00:00
Removing raw HTML
This commit is contained in:
parent
e06d45e053
commit
572d587b8e
33 changed files with 373 additions and 386 deletions
|
@ -24,7 +24,7 @@ This algorithm is called *iterative policy evaluation*.
|
|||
|
||||
To produce each successive approximation, $v_{k + 1}$ from $v_k$, iterative policy evaluation applies the same operation to each state $s$: it replaces the old value of $s$ with a new value obtained from the old values of the successor states of $s$, and the expected immediate rewards, along all the one-step transitions possible under the policy being evaluated.
|
||||
|
||||
<u>**Iterative Policy Evaluation**</u>
|
||||
**Iterative Policy Evaluation**
|
||||
|
||||
```
|
||||
Input π, the policy to be evaluated
|
||||
|
@ -69,7 +69,7 @@ Each policy is guaranteed to be a strict improvement over the previous one (unle
|
|||
|
||||
This way of finding an optimal policy is called *policy iteration*.
|
||||
|
||||
<u>Algorithm</u>
|
||||
**Algorithm**
|
||||
|
||||
```
|
||||
1. Initialization
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue