mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-11-09 10:40:34 -05:00
Fixed titles, math rendering, and links on some pages
This commit is contained in:
parent
9f096a8720
commit
330ace0de9
61 changed files with 303 additions and 115 deletions
|
@ -1,4 +1,10 @@
|
|||
# Abstract Algebra 2 Definitions
|
||||
---
|
||||
title: Abstract Algebra Notes
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Chapter markings are based off the book "A Book of Abstract Algebra" by Charles C. Pinter.
|
||||
|
||||
## Chapter 17
|
||||
|
||||
|
|
|
@ -2,7 +2,6 @@
|
|||
title: Algorithms Book Study
|
||||
---
|
||||
|
||||
# Algorithms Book Study
|
||||
A couple of my friends and I decided to start a book club following "Algorithms" by Jeff Erickson. One bonus is that he gives it away for free on [his website](http://jeffe.cs.illinois.edu/teaching/algorithms/)!
|
||||
|
||||
Of course you should totally check his book out rather than reading my notes. There are tons of witty and fun things in his textbook, not a dry reading I promise. These notes are here mostly for archival purposes.
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Backtracking
|
||||
---
|
||||
title: Backtracking
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
This algorithm tries to construct a solution to a problem one piece at a time. Whenever the algorithm needs to decide between multiple alternatives to the part of the solution it *recursively* evaluates every option and chooses the best one.
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,6 @@
|
|||
# Dynamic Programming
|
||||
---
|
||||
title: Dynamic Programming
|
||||
---
|
||||
|
||||
The book first goes into talking about the complexity of the Fibonacci algorithm
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Greedy Algorithms
|
||||
---
|
||||
title: Greedy Algorithms
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
Greedy Algorithms are about making the best local choice and then blindly plowing ahead.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Recursion
|
||||
---
|
||||
title: Recursion
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Reductions
|
||||
|
||||
|
|
|
@ -1,12 +1,8 @@
|
|||
---
|
||||
title: Bayesian Statistics
|
||||
title: Bayesian Statistics - From Concept to Data Analysis
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
|
||||
|
||||
# Bayesian Statistics: From Concept to Data Analysis
|
||||
|
||||
In the Winter of 2017, I took a course on Bayesian Statistics on Coursera offered by Dr. Herbert Lee.
|
||||
|
||||
Below are the notes for each of the four weeks.
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Bayesian Statistics
|
||||
---
|
||||
title: Week 1
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Rules of Probability
|
||||
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
---
|
||||
title: Week 2
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Under the frequentest paradigm, you view the data as a random sample from some larger, potentially hypothetical population. We can then make probability statements i.e, long-run frequency statements based on this larger population.
|
||||
|
||||
## Coin Flip Example (Central Limit Theorem)
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
---
|
||||
title: Week 3
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
How do we choose a prior?
|
||||
|
||||
Our prior needs to represent our personal perspective, beliefs, and our uncertainties.
|
||||
|
|
|
@ -1,3 +1,9 @@
|
|||
---
|
||||
title: Week 4
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Exponential Data
|
||||
|
||||
Suppose you're waiting for a bus that you think comes on average once every 10 minutes, but you're not sure exactly how often it comes.
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Handy Quadratic Congruences Facts
|
||||
---
|
||||
title: Handy Facts about Quadratic Congruences
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Number of Solutions
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Real Analysis Sheet
|
||||
---
|
||||
title: Real Analysis Quick Sheet
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
**Fact:** $\forall a,b, \in \mathbb{R}$, $\sqrt{ab} \le \frac{1}{2}(a + b)$.
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Reproducible Research Week 1
|
||||
---
|
||||
title: Week 1
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## Replication
|
||||
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
---
|
||||
title: Week 2
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## Coding Standards for R
|
||||
|
||||
1. Always use text files/text editor
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
---
|
||||
title: Week 3
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## tl;dr
|
||||
|
||||
People are busy, especially managers and leaders. Results of data analyses are sometimes presented in oral form, but often the first cut is presented via email.
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
---
|
||||
title: Week 4
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## The `cacher` Package for R
|
||||
|
||||
- Add-on package for R
|
||||
|
|
|
@ -3,10 +3,6 @@ title: Probability and Statistical Inference
|
|||
showthedate: false
|
||||
---
|
||||
|
||||
|
||||
|
||||
# Probability and Statistical Inference
|
||||
|
||||
In the Fall of 2017, I took the course STAT 381 with Dr. Debra Hydorn. Below I included the interesting labs we worked on in the class.
|
||||
|
||||
*Please note that these reports were not formatted for this site. So equations and images may not show up.*
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
# Central Limit Theorem Lab
|
||||
|
||||
**Brandon Rozek**
|
||||
---
|
||||
title: Central Limit Theorem
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
# Confidence Interval Lab
|
||||
|
||||
**Written by Brandon Rozek**
|
||||
---
|
||||
title: Confidence Interval
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Random Number Generation
|
||||
---
|
||||
title: Random Number Generation
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Random Walk
|
||||
---
|
||||
title: Random Walk
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
|
|
|
@ -3,7 +3,6 @@ Title: Cluster Analysis
|
|||
Description: A study of grouping observations
|
||||
---
|
||||
|
||||
# Cluster Analysis
|
||||
Cluster Analysis is the art of finding inherent structures in data to form groups of similar observations. This has a myriad of applications from recommendation engines to social network analysis.
|
||||
|
||||
This is an independent study, meaning that I will be studying this topic under the direction of a professor, in this case being Dr. Denhere.
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Lecture Notes for Cluster Analysis
|
||||
---
|
||||
title: Lecture Notes for Cluster Analysis
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
[Lecture 1: Measures of Similarity](lec1)
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Measures of similarity
|
||||
---
|
||||
title: Measures of similarity
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
To identify clusters of observations we need to know how **close individuals are to each other** or **how far apart they are**.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Silhouette
|
||||
---
|
||||
title: Silhouette
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
This technique validates the consistency within clusters of data. It provides a succinct graphical representation of how well each object lies in its cluster.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Centroid-based Clustering
|
||||
---
|
||||
title: Centroid-based Clustering
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
In centroid-based clustering, clusters are represented by some central vector which may or may not be a member of the dataset. In practice, the number of clusters is fixed to $k$ and the goal is to solve some sort of optimization problem.
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Voronoi Diagram
|
||||
---
|
||||
title: Voronoi Diagram
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
A Voronoi diagram is a partitioning of a plan into regions based on distance to points in a specific subset of the plane.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# K-means++
|
||||
---
|
||||
title: K-means++
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
K-means++ is an algorithm for choosing the initial values or seeds for the k-means clustering algorithm. This was proposed as a way of avoiding the sometimes poor clustering found by a standard k-means algorithm.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# K-Medoids
|
||||
---
|
||||
title: K-Medoids
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
A medoid can be defined as the object of a cluster whose average dissimilarity to all the objects in the cluster is minimal.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# K-Medians
|
||||
---
|
||||
title: K-Medians
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Introduction to Density Based Clustering
|
||||
---
|
||||
title: Introduction to Density Based Clustering
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
In density-based clustering, clusters are defined as areas of higher density than the remainder of the data sets. Objects in more sparse areas are considered to be outliers or border points. This helps discover clusters of arbitrary shape.
|
||||
|
||||
|
@ -31,7 +35,7 @@ A cluster then satisfies two properties:
|
|||
2. Find the connected components of *core* points on the neighborhood graph, ignoring all non-core points.
|
||||
3. Assign each non-core point to a nearby cluster if the cluster is an $\epsilon$ (eps) neighbor, otherwise assign it to noise.
|
||||
|
||||
###Advantages
|
||||
### Advantages
|
||||
|
||||
- Does not require one to specify the number of clusters in the data
|
||||
- Can find arbitrarily shaped clusters
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Why use different distance measures?
|
||||
---
|
||||
title: Why use different distance measures?
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
I made an attempt to find out in what situations people use different distance measures. Looking around in the Internet usually produces the results "It depends on the problem" or "I typically just always use Euclidean"
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Principal Component Analysis Pt. 1
|
||||
---
|
||||
title: Principal Component Analysis Pt. 1
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## What is PCA?
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Revisiting Similarity Measures
|
||||
---
|
||||
title: Revisiting Similarity Measures
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Manhatten Distance
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Cluster Tendency
|
||||
---
|
||||
title: Cluster Tendency
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
This is the assessment of the suitability of clustering. Cluster Tendency determines whether the data has any inherent grouping structure.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Principal Component Analysis Part 2: Formal Theory
|
||||
---
|
||||
title: Principal Component Analysis Part 2 - Formal Theory
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
##Properties of PCA
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Introduction to Connectivity Based Models
|
||||
---
|
||||
title: Introduction to Connectivity Based Models
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
Hierarchical algorithms combine observations to form clusters based on their distance.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Agglomerative Methods
|
||||
---
|
||||
title: Agglomerative Methods
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Single Linkage
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Divisive Methods Pt.1
|
||||
---
|
||||
title: Divisive Methods Pt.1
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Divisive methods work in the opposite direction of agglomerative methods. They take one large cluster and successively splits it.
|
||||
|
||||
|
@ -39,7 +43,7 @@ This is sometimes termed *association analysis*.
|
|||
| 1 | a | b |
|
||||
| 0 | c | d |
|
||||
|
||||
####Common measures of association
|
||||
#### Common measures of association
|
||||
|
||||
$$
|
||||
|ad-bc| \tag{4.6}
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Divisive Methods Pt 2.
|
||||
---
|
||||
title: Divisive Methods Pt 2.
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
Recall in the previous section that we spoke about Monothetic and Polythetic methods. Monothetic methods only looks at a single variable at a time while Polythetic looks at multiple variables simultaneously. In this section, we will speak more about polythetic divisive methods.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# CURE and TSNE
|
||||
---
|
||||
title: CURE and TSNE
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
##Clustering Using Representatives (CURE)
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Cluster Validation
|
||||
---
|
||||
title: Cluster Validation
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
There are multiple approaches to validating your cluster models
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Readings for Lectures of Cluster Analysis
|
||||
---
|
||||
title: Readings for Independent Study of Cluster Analysis
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## Lecture 1
|
||||
Garson Textbook Chapter 3
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Cluster Analysis Spring 2018
|
||||
---
|
||||
title: Cluster Analysis
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
### Distance, Dimensionality Reduction, and Tendency
|
||||
|
||||
|
|
|
@ -1,13 +0,0 @@
|
|||
## Weekly Progress
|
||||
|
||||
I didn't do the greatest job at writing a progress report every week but here on the page are the ones I did write.
|
||||
|
||||
[January 29 2019](Jan29)
|
||||
|
||||
[February 12 2019](Feb12)
|
||||
|
||||
[February 25 2019](Feb25)
|
||||
|
||||
[March 26 2019](Mar26)
|
||||
|
||||
[April 2 2019](Apr2)
|
|
@ -1,4 +1,7 @@
|
|||
# Progress Report for Week of April 2nd
|
||||
---
|
||||
title: Progress Report for Week of April 2nd
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
## Added Video Recording Capability to MinAtar environment
|
||||
|
||||
|
@ -20,6 +23,3 @@ Playing around with the ideas of autoencoders, I wanted a way to discretized my
|
|||
|
||||
I'll probably look towards implementing what the paper did for my library and move on to a different piece.
|
||||
|
||||
|
||||
|
||||
Guru Indian: 3140 Cowan Blvd, Fredericksburg, VA 22401
|
|
@ -1,4 +1,8 @@
|
|||
# Weekly Progress Feb 12
|
||||
---
|
||||
title: Weekly Progress Feb 12
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Finished writing scripts for data collection
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Weekly Progress for February 25th
|
||||
---
|
||||
title: Weekly Progress for February 25th
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Evolutionary Algorithms
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Weekly Progress Jan 29
|
||||
---
|
||||
title: Weekly Progress Jan 29
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## 1. Training From Demonstrations
|
||||
|
||||
|
@ -6,13 +10,13 @@ Training from demonstrations is the act of using previous data to help speed up
|
|||
|
||||
I read two papers on the topic:
|
||||
|
||||
[1] Gabriel V. de la Cruz Jr., Yunshu Du, Matthew E. Taylor. **Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning**.
|
||||
(1) Gabriel V. de la Cruz Jr., Yunshu Du, Matthew E. Taylor. **Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning**.
|
||||
|
||||
https://arxiv.org/abs/1709.04083
|
||||
|
||||
The authors showed how you can speed up the training of a DQN network, especially under problems involving computer vision, if you first train the convolution layers by using a supervised loss between the actions the network would choose and the actions from the demonstration data given a state.
|
||||
|
||||
[2] Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. **Deep Q-learning from Demonstrations.**
|
||||
(2) Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. **Deep Q-learning from Demonstrations.**
|
||||
|
||||
https://arxiv.org/abs/1704.03732
|
||||
|
||||
|
@ -32,9 +36,9 @@ The intuition behind this is that for the loss to be zero, the network would've
|
|||
|
||||
The main environment I chose to test these algorithms is Acrobot. It is a control theory problem and it has several physics related numbers as input. (Not image based)
|
||||
|
||||
I noticed when implementing [1] at least for the non-convolution case, there's no point in trying to train earlier layers. Perhaps I'll try again when I move onto the atari gameplays...
|
||||
I noticed when implementing (1) at least for the non-convolution case, there's no point in trying to train earlier layers. Perhaps I'll try again when I move onto the atari gameplays...
|
||||
|
||||
I decided against following [2] exactly. It's not that I disagree with the approach, but I don't like the need for "expert" data. If you decide to proceed anyways with non-expert data, you need to remember that it is incorporated into the loss function. Which means that you fall risk into learning sub-optimal policies.
|
||||
I decided against following (2) exactly. It's not that I disagree with the approach, but I don't like the need for "expert" data. If you decide to proceed anyways with non-expert data, you need to remember that it is incorporated into the loss function. Which means that you fall risk into learning sub-optimal policies.
|
||||
|
||||
In the end, what I decided to do was the following
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Progress for Week of March 26
|
||||
---
|
||||
title: Progress for Week of March 26
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
## Parallelized Evolutionary Strategies
|
||||
|
||||
|
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
title: Weekly Progress
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
I didn't do the greatest job at writing a progress report every week but here on the page are the ones I did write.
|
||||
|
||||
[January 29 2019](jan29)
|
||||
|
||||
[February 12 2019](feb12)
|
||||
|
||||
[February 25 2019](feb25)
|
||||
|
||||
[March 26 2019](mar26)
|
||||
|
||||
[April 2 2019](apr2)
|
|
@ -1,4 +1,5 @@
|
|||
---
|
||||
title: Abstract
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
|
|
|
@ -3,7 +3,6 @@ Title: Reinforcement Learning
|
|||
Description: The study of optimally mapping situations to actions
|
||||
---
|
||||
|
||||
# Reinforcement Learning
|
||||
Reinforcement learning is the art of analyzing situations and mapping them to actions in order to maximize a numerical reward signal.
|
||||
|
||||
In this independent study, I as well as Dr. Stephen Davies, will explore the Reinforcement Learning problem and its subproblems. We will go over the bandit problem, markov decision processes, and discover how best to translate a problem in order to **make decisions**.
|
||||
|
@ -38,8 +37,6 @@ Code will occasionally be written to solidify the learning material and to act a
|
|||
|
||||
[Github Link](https://github.com/brandon-rozek/ReinforcementLearning)
|
||||
|
||||
Specifically, if you want to see agents I've created to solve some OpenAI environments, take a look at this specific folder in the Github Repository
|
||||
|
||||
[Github Link](https://github.com/Brandon-Rozek/ReinforcementLearning/tree/master/agents)
|
||||
Specifically, if you want to see agents I've created to solve some OpenAI environments, take a look at this specific folder in the Github Repository. [Github Link](https://github.com/Brandon-Rozek/ReinforcementLearning/tree/master/agents)
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Lecture Notes for Reinforcement Learning
|
||||
---
|
||||
title: Lecture Notes for Reinforcement Learning
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
[Chapter 1: An Introduction](intro)
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 2: Multi-armed Bandits
|
||||
---
|
||||
title: Chapter 2 - Multi-armed Bandits
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Reinforcement learning *evaluates* the actions taken rather than accepting $instructions$ of the correct actions. This creates the need for active exploration.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 4: Dynamic Programming
|
||||
---
|
||||
title: Chapter 4 - Dynamic Programming
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Dynamic programming refers to a collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a Markov decision process (MDP).
|
||||
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# Introduction to Reinforcement Learning Day 1
|
||||
---
|
||||
title: Introduction to Reinforcement Learning Day 1
|
||||
showthedate: false
|
||||
---
|
||||
|
||||
Recall that this course is based on the book --
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 5: Monte Carlo Methods
|
||||
---
|
||||
title: Chapter 5 - Monte Carlo Methods
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Monte Carlo methods do not assume complete knowledge of the environment. They require only *experience* which is a sample sequence of states, actions, and rewards from actual or simulated interaction with an environment.
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# Chapter 3: Finite Markov Decision Processes
|
||||
---
|
||||
title: Chapter 3 - Finite Markov Decision Processes
|
||||
showthedate: false
|
||||
math: true
|
||||
---
|
||||
|
||||
Markov Decision processes are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to trade-off immediate and delayed reward. Whereas in bandit problems we estimated the value of $q_*(a)$ of each action $a$, in MDPs we estimate the value of $q_*(s, a)$ of each action $a$ in state $s$.
|
||||
|
||||
|
|
Loading…
Reference in a new issue