From 330ace0de90dafe6deddb32fd5c7b65932b42e39 Mon Sep 17 00:00:00 2001 From: Brandon Rozek Date: Mon, 26 Jul 2021 09:13:20 -0400 Subject: [PATCH] Fixed titles, math rendering, and links on some pages --- content/notes/abstract2def.md | 8 +++++++- content/notes/algorithms.md | 3 +-- content/notes/algorithms/backtracking.md | 8 ++++++-- content/notes/algorithms/dynamic.md | 6 ++++-- content/notes/algorithms/greedy.md | 7 +++++-- content/notes/algorithms/recursion.md | 6 +++++- content/notes/bayesianstatistics.md | 6 +----- content/notes/bayesianstatistics/week1.md | 6 +++++- content/notes/bayesianstatistics/week2.md | 8 +++++++- content/notes/bayesianstatistics/week3.md | 8 +++++++- content/notes/bayesianstatistics/week4.md | 6 ++++++ content/notes/quadraticcongruences.md | 6 +++++- content/notes/realanalysis.md | 8 ++++++-- content/notes/reproducibleresearch/week1.md | 7 +++++-- content/notes/reproducibleresearch/week2.md | 5 +++++ content/notes/reproducibleresearch/week3.md | 7 ++++++- content/notes/reproducibleresearch/week4.md | 5 +++++ content/notes/stat381.md | 4 ---- content/notes/stat381/centrallimit.md | 8 +++++--- content/notes/stat381/confint.md | 7 ++++--- content/notes/stat381/randomnumber.md | 6 +++++- content/notes/stat381/randomwalk.md | 6 +++++- content/research/clusteranalysis.md | 1 - content/research/clusteranalysis/notes.md | 5 ++++- content/research/clusteranalysis/notes/lec1.md | 8 ++++++-- .../research/clusteranalysis/notes/lec10-1.md | 6 +++++- .../research/clusteranalysis/notes/lec10-2.md | 6 +++++- .../research/clusteranalysis/notes/lec10-3.md | 5 ++++- .../research/clusteranalysis/notes/lec11-1.md | 6 +++++- .../research/clusteranalysis/notes/lec11-2.md | 6 +++++- .../research/clusteranalysis/notes/lec11-3.md | 8 ++++++-- content/research/clusteranalysis/notes/lec12.md | 10 +++++++--- content/research/clusteranalysis/notes/lec2-1.md | 7 +++++-- content/research/clusteranalysis/notes/lec2-2.md | 8 ++++++-- content/research/clusteranalysis/notes/lec4-2.md | 6 +++++- content/research/clusteranalysis/notes/lec4-3.md | 8 ++++++-- content/research/clusteranalysis/notes/lec4.md | 8 ++++++-- content/research/clusteranalysis/notes/lec5.md | 7 +++++-- content/research/clusteranalysis/notes/lec6.md | 8 ++++++-- content/research/clusteranalysis/notes/lec7.md | 10 +++++++--- content/research/clusteranalysis/notes/lec8.md | 7 +++++-- content/research/clusteranalysis/notes/lec9-1.md | 6 +++++- content/research/clusteranalysis/notes/lec9-2.md | 8 ++++++-- content/research/clusteranalysis/readings.md | 5 ++++- content/research/clusteranalysis/syllabus.md | 7 +++++-- .../deepreinforcementlearning/WeeklyProgress.md | 13 ------------- .../WeeklyProgress/Apr2.md | 8 ++++---- .../WeeklyProgress/Feb12.md | 8 ++++++-- .../WeeklyProgress/Feb25.md | 6 +++++- .../WeeklyProgress/Jan29.md | 16 ++++++++++------ .../WeeklyProgress/Mar26.md | 8 ++++++-- .../WeeklyProgress/_index.md | 16 ++++++++++++++++ .../_index.md} | 0 .../abstractspring2019.md | 3 ++- content/research/reinforcementlearning.md | 5 +---- content/research/reinforcementlearning/notes.md | 5 ++++- .../reinforcementlearning/notes/bandits.md | 8 ++++++-- .../reinforcementlearning/notes/dynamic.md | 6 +++++- .../reinforcementlearning/notes/intro.md | 7 +++++-- .../reinforcementlearning/notes/mcmethods.md | 6 +++++- .../research/reinforcementlearning/notes/mdp.md | 6 +++++- 61 files changed, 303 insertions(+), 115 deletions(-) delete mode 100644 content/research/deepreinforcementlearning/WeeklyProgress.md create mode 100644 content/research/deepreinforcementlearning/WeeklyProgress/_index.md rename content/research/{deepreinforcementlearning.md => deepreinforcementlearning/_index.md} (100%) diff --git a/content/notes/abstract2def.md b/content/notes/abstract2def.md index d3a8009..4c1b755 100644 --- a/content/notes/abstract2def.md +++ b/content/notes/abstract2def.md @@ -1,4 +1,10 @@ -# Abstract Algebra 2 Definitions +--- +title: Abstract Algebra Notes +showthedate: false +math: true +--- + +Chapter markings are based off the book "A Book of Abstract Algebra" by Charles C. Pinter. ## Chapter 17 diff --git a/content/notes/algorithms.md b/content/notes/algorithms.md index 9178b7b..6d662bc 100644 --- a/content/notes/algorithms.md +++ b/content/notes/algorithms.md @@ -2,7 +2,6 @@ title: Algorithms Book Study --- -# Algorithms Book Study A couple of my friends and I decided to start a book club following "Algorithms" by Jeff Erickson. One bonus is that he gives it away for free on [his website](http://jeffe.cs.illinois.edu/teaching/algorithms/)! Of course you should totally check his book out rather than reading my notes. There are tons of witty and fun things in his textbook, not a dry reading I promise. These notes are here mostly for archival purposes. @@ -14,4 +13,4 @@ Of course you should totally check his book out rather than reading my notes. Th [Chapter 3](dynamic) -[Chapter 4](greedy) \ No newline at end of file +[Chapter 4](greedy) diff --git a/content/notes/algorithms/backtracking.md b/content/notes/algorithms/backtracking.md index 5f9c7ae..fe11915 100644 --- a/content/notes/algorithms/backtracking.md +++ b/content/notes/algorithms/backtracking.md @@ -1,4 +1,8 @@ -# Backtracking +--- +title: Backtracking +showthedate: false +--- + This algorithm tries to construct a solution to a problem one piece at a time. Whenever the algorithm needs to decide between multiple alternatives to the part of the solution it *recursively* evaluates every option and chooses the best one. @@ -59,4 +63,4 @@ ConstructSubset(X, i, T): Backtracking algorithms are used to make a *sequence of decisions*. -When we design a new recursive backtracking algorithm, we must figure out in advance what information we will need about past decisions in the middle of the algorithm. \ No newline at end of file +When we design a new recursive backtracking algorithm, we must figure out in advance what information we will need about past decisions in the middle of the algorithm. diff --git a/content/notes/algorithms/dynamic.md b/content/notes/algorithms/dynamic.md index 5d1b785..f4b8efb 100644 --- a/content/notes/algorithms/dynamic.md +++ b/content/notes/algorithms/dynamic.md @@ -1,4 +1,6 @@ -# Dynamic Programming +--- +title: Dynamic Programming +--- The book first goes into talking about the complexity of the Fibonacci algorithm @@ -79,4 +81,4 @@ These algorithms are best developed in two distinct stages. ## Greedy Algorithms -If we're lucky we can just make decisions directly instead of solving any recursive subproblems. The problem is that greedly algorithms almost never work. \ No newline at end of file +If we're lucky we can just make decisions directly instead of solving any recursive subproblems. The problem is that greedly algorithms almost never work. diff --git a/content/notes/algorithms/greedy.md b/content/notes/algorithms/greedy.md index 5477c9a..c3982f4 100644 --- a/content/notes/algorithms/greedy.md +++ b/content/notes/algorithms/greedy.md @@ -1,4 +1,7 @@ -# Greedy Algorithms +--- +title: Greedy Algorithms +showthedate: false +--- Greedy Algorithms are about making the best local choice and then blindly plowing ahead. @@ -33,4 +36,4 @@ An unstable match is when The Gale-Shapley algorithm is a great greedy fit. It goes like this 1. An arbitrary unmatched hospital A offers its position to the best doctor a who has not already rejected it. -2. If a is unmatched, she tentatively accepts A's offer. If a already had a match but prefers A, she rejects her current match and tentatively accepts the new offer from A. Otherwise a rejects the new offer. \ No newline at end of file +2. If a is unmatched, she tentatively accepts A's offer. If a already had a match but prefers A, she rejects her current match and tentatively accepts the new offer from A. Otherwise a rejects the new offer. diff --git a/content/notes/algorithms/recursion.md b/content/notes/algorithms/recursion.md index 96c2383..2ea01f7 100644 --- a/content/notes/algorithms/recursion.md +++ b/content/notes/algorithms/recursion.md @@ -1,4 +1,8 @@ -# Recursion +--- +title: Recursion +showthedate: false +math: true +--- ## Reductions diff --git a/content/notes/bayesianstatistics.md b/content/notes/bayesianstatistics.md index 84a1897..bac068d 100644 --- a/content/notes/bayesianstatistics.md +++ b/content/notes/bayesianstatistics.md @@ -1,12 +1,8 @@ --- -title: Bayesian Statistics +title: Bayesian Statistics - From Concept to Data Analysis showthedate: false --- - - -# Bayesian Statistics: From Concept to Data Analysis - In the Winter of 2017, I took a course on Bayesian Statistics on Coursera offered by Dr. Herbert Lee. Below are the notes for each of the four weeks. diff --git a/content/notes/bayesianstatistics/week1.md b/content/notes/bayesianstatistics/week1.md index 3480a84..257d615 100644 --- a/content/notes/bayesianstatistics/week1.md +++ b/content/notes/bayesianstatistics/week1.md @@ -1,4 +1,8 @@ -# Bayesian Statistics +--- +title: Week 1 +showthedate: false +math: true +--- ## Rules of Probability diff --git a/content/notes/bayesianstatistics/week2.md b/content/notes/bayesianstatistics/week2.md index 54b2a6f..33eae16 100644 --- a/content/notes/bayesianstatistics/week2.md +++ b/content/notes/bayesianstatistics/week2.md @@ -1,3 +1,9 @@ +--- +title: Week 2 +showthedate: false +math: true +--- + Under the frequentest paradigm, you view the data as a random sample from some larger, potentially hypothetical population. We can then make probability statements i.e, long-run frequency statements based on this larger population. ## Coin Flip Example (Central Limit Theorem) @@ -534,4 +540,4 @@ It may be fixed, but because we don't know what that value is, we represent our Frequentest confidence intervals have the interpretation that "If you were to repeat many times the process of collecting data and computing a 95% confidence interval, then on average about 95% of those intervals would contain the true parameter value; however, once you observe data and compute an interval the true value is either in the interval or it is not, but you can't tell which." -Bayesian credible intervals have the interpretation that "Your posterior probability that the parameter is in a 95% credible interval is 95%." \ No newline at end of file +Bayesian credible intervals have the interpretation that "Your posterior probability that the parameter is in a 95% credible interval is 95%." diff --git a/content/notes/bayesianstatistics/week3.md b/content/notes/bayesianstatistics/week3.md index cb1c1fc..dbd36da 100644 --- a/content/notes/bayesianstatistics/week3.md +++ b/content/notes/bayesianstatistics/week3.md @@ -1,3 +1,9 @@ +--- +title: Week 3 +showthedate: false +math: true +--- + How do we choose a prior? Our prior needs to represent our personal perspective, beliefs, and our uncertainties. @@ -406,4 +412,4 @@ After this, we need some other piece of knowledge to pin point both parameters. 2. In Bayesian Statistics, a vague prior refers to one that's relatively flat across much of the space. For a Gamma prior we can choose $\Gamma(\epsilon, \epsilon)$ where $\epsilon$ is small and strictly positive. -This would create a distribution with a mean of 1 and a huge standard deviation across the whole space. Hence the posterior will be largely driven by the data and very little by the prior. \ No newline at end of file +This would create a distribution with a mean of 1 and a huge standard deviation across the whole space. Hence the posterior will be largely driven by the data and very little by the prior. diff --git a/content/notes/bayesianstatistics/week4.md b/content/notes/bayesianstatistics/week4.md index d2d4038..51072b5 100644 --- a/content/notes/bayesianstatistics/week4.md +++ b/content/notes/bayesianstatistics/week4.md @@ -1,3 +1,9 @@ +--- +title: Week 4 +showthedate: false +math: true +--- + ## Exponential Data Suppose you're waiting for a bus that you think comes on average once every 10 minutes, but you're not sure exactly how often it comes. diff --git a/content/notes/quadraticcongruences.md b/content/notes/quadraticcongruences.md index 594faea..c957286 100644 --- a/content/notes/quadraticcongruences.md +++ b/content/notes/quadraticcongruences.md @@ -1,4 +1,8 @@ -# Handy Quadratic Congruences Facts +--- +title: Handy Facts about Quadratic Congruences +showthedate: false +math: true +--- ## Number of Solutions diff --git a/content/notes/realanalysis.md b/content/notes/realanalysis.md index af1c1f5..c8109a1 100644 --- a/content/notes/realanalysis.md +++ b/content/notes/realanalysis.md @@ -1,4 +1,8 @@ -# Real Analysis Sheet +--- +title: Real Analysis Quick Sheet +showthedate: false +math: true +--- **Fact:** $\forall a,b, \in \mathbb{R}$, $\sqrt{ab} \le \frac{1}{2}(a + b)$. @@ -90,4 +94,4 @@ Let $(x_n), (y_n)$ be strictly positive sequence of real numbers. Suppose $r = \ (b) If ($r = 0$ and $\sum{y_n} < \infty$), then $\sum{x_n} < \infty$. - \ No newline at end of file + diff --git a/content/notes/reproducibleresearch/week1.md b/content/notes/reproducibleresearch/week1.md index 4db0cc4..4087034 100644 --- a/content/notes/reproducibleresearch/week1.md +++ b/content/notes/reproducibleresearch/week1.md @@ -1,4 +1,7 @@ -# Reproducible Research Week 1 +--- +title: Week 1 +showthedate: false +--- ## Replication @@ -394,4 +397,4 @@ It should tell a story It should not include every analysis you performed -References should be included for statistical methods \ No newline at end of file +References should be included for statistical methods diff --git a/content/notes/reproducibleresearch/week2.md b/content/notes/reproducibleresearch/week2.md index 49779f6..9c0a755 100644 --- a/content/notes/reproducibleresearch/week2.md +++ b/content/notes/reproducibleresearch/week2.md @@ -1,3 +1,8 @@ +--- +title: Week 2 +showthedate: false +--- + ## Coding Standards for R 1. Always use text files/text editor diff --git a/content/notes/reproducibleresearch/week3.md b/content/notes/reproducibleresearch/week3.md index 1a92f30..9affefc 100644 --- a/content/notes/reproducibleresearch/week3.md +++ b/content/notes/reproducibleresearch/week3.md @@ -1,3 +1,8 @@ +--- +title: Week 3 +showthedate: false +--- + ## tl;dr People are busy, especially managers and leaders. Results of data analyses are sometimes presented in oral form, but often the first cut is presented via email. @@ -305,4 +310,4 @@ The premise of reproducible research is that with data/code available, people ca - Reproducible research focuses on the most "downstream" aspect of research documentation - Evidence-based data analysis would provide standardized best practices for given scientific areas and questions - Gives reviewers an important tool without dramatically increases the burden on them -- More effort should be put into improving the quality of "upstream" aspects of scientific research \ No newline at end of file +- More effort should be put into improving the quality of "upstream" aspects of scientific research diff --git a/content/notes/reproducibleresearch/week4.md b/content/notes/reproducibleresearch/week4.md index 1ce5fad..a7a6a25 100644 --- a/content/notes/reproducibleresearch/week4.md +++ b/content/notes/reproducibleresearch/week4.md @@ -1,3 +1,8 @@ +--- +title: Week 4 +showthedate: false +--- + ## The `cacher` Package for R - Add-on package for R diff --git a/content/notes/stat381.md b/content/notes/stat381.md index 146b80f..775fcd7 100644 --- a/content/notes/stat381.md +++ b/content/notes/stat381.md @@ -3,10 +3,6 @@ title: Probability and Statistical Inference showthedate: false --- - - -# Probability and Statistical Inference - In the Fall of 2017, I took the course STAT 381 with Dr. Debra Hydorn. Below I included the interesting labs we worked on in the class. *Please note that these reports were not formatted for this site. So equations and images may not show up.* diff --git a/content/notes/stat381/centrallimit.md b/content/notes/stat381/centrallimit.md index 440b2c8..42443bd 100644 --- a/content/notes/stat381/centrallimit.md +++ b/content/notes/stat381/centrallimit.md @@ -1,6 +1,8 @@ -# Central Limit Theorem Lab - -**Brandon Rozek** +--- +title: Central Limit Theorem +showthedate: false +math: true +--- ## Introduction diff --git a/content/notes/stat381/confint.md b/content/notes/stat381/confint.md index 9ed0f73..197f5bd 100644 --- a/content/notes/stat381/confint.md +++ b/content/notes/stat381/confint.md @@ -1,6 +1,7 @@ -# Confidence Interval Lab - -**Written by Brandon Rozek** +--- +title: Confidence Interval +showthedate: false +--- ## Introduction diff --git a/content/notes/stat381/randomnumber.md b/content/notes/stat381/randomnumber.md index dcdb5d4..a73a217 100644 --- a/content/notes/stat381/randomnumber.md +++ b/content/notes/stat381/randomnumber.md @@ -1,4 +1,8 @@ -# Random Number Generation +--- +title: Random Number Generation +showthedate: false +math: true +--- ## Introduction diff --git a/content/notes/stat381/randomwalk.md b/content/notes/stat381/randomwalk.md index 6c78179..97b0ae1 100644 --- a/content/notes/stat381/randomwalk.md +++ b/content/notes/stat381/randomwalk.md @@ -1,4 +1,8 @@ -# Random Walk +--- +title: Random Walk +showthedate: false +math: true +--- ## Introduction diff --git a/content/research/clusteranalysis.md b/content/research/clusteranalysis.md index 9eebd18..0aaf520 100644 --- a/content/research/clusteranalysis.md +++ b/content/research/clusteranalysis.md @@ -3,7 +3,6 @@ Title: Cluster Analysis Description: A study of grouping observations --- -# Cluster Analysis Cluster Analysis is the art of finding inherent structures in data to form groups of similar observations. This has a myriad of applications from recommendation engines to social network analysis. This is an independent study, meaning that I will be studying this topic under the direction of a professor, in this case being Dr. Denhere. diff --git a/content/research/clusteranalysis/notes.md b/content/research/clusteranalysis/notes.md index 5102309..0e27802 100644 --- a/content/research/clusteranalysis/notes.md +++ b/content/research/clusteranalysis/notes.md @@ -1,4 +1,7 @@ -# Lecture Notes for Cluster Analysis +--- +title: Lecture Notes for Cluster Analysis +showthedate: false +--- [Lecture 1: Measures of Similarity](lec1) diff --git a/content/research/clusteranalysis/notes/lec1.md b/content/research/clusteranalysis/notes/lec1.md index 5b61503..a1bbd11 100644 --- a/content/research/clusteranalysis/notes/lec1.md +++ b/content/research/clusteranalysis/notes/lec1.md @@ -1,4 +1,8 @@ -# Measures of similarity +--- +title: Measures of similarity +showthedate: false +math: true +--- To identify clusters of observations we need to know how **close individuals are to each other** or **how far apart they are**. @@ -328,4 +332,4 @@ Firstly, the nature of the data should strongly influence the choice of the prox Next, the choice of measure should depend on the scale of the data. Similarity coefficients should be used when the data is binary. For continuous data, distance of correlation-type dissimilarity measure should be used according to whether 'size' or 'shape' of the objects is of interest. -Finally, the clustering method to be used might have some implications for the choice of the coefficient. For example, making a choice between several proximity coefficients with similar properties which are also known to be monotonically related can be avoided by employing a cluster method that depends only on the ranking of the proximities, not their absolute values. \ No newline at end of file +Finally, the clustering method to be used might have some implications for the choice of the coefficient. For example, making a choice between several proximity coefficients with similar properties which are also known to be monotonically related can be avoided by employing a cluster method that depends only on the ranking of the proximities, not their absolute values. diff --git a/content/research/clusteranalysis/notes/lec10-1.md b/content/research/clusteranalysis/notes/lec10-1.md index a984b48..d61963d 100644 --- a/content/research/clusteranalysis/notes/lec10-1.md +++ b/content/research/clusteranalysis/notes/lec10-1.md @@ -1,4 +1,8 @@ -# Silhouette +--- +title: Silhouette +showthedate: false +math: true +--- This technique validates the consistency within clusters of data. It provides a succinct graphical representation of how well each object lies in its cluster. diff --git a/content/research/clusteranalysis/notes/lec10-2.md b/content/research/clusteranalysis/notes/lec10-2.md index ec85b61..2abffdb 100644 --- a/content/research/clusteranalysis/notes/lec10-2.md +++ b/content/research/clusteranalysis/notes/lec10-2.md @@ -1,4 +1,8 @@ - # Centroid-based Clustering +--- +title: Centroid-based Clustering +showthedate: false +math: true +--- In centroid-based clustering, clusters are represented by some central vector which may or may not be a member of the dataset. In practice, the number of clusters is fixed to $k$ and the goal is to solve some sort of optimization problem. diff --git a/content/research/clusteranalysis/notes/lec10-3.md b/content/research/clusteranalysis/notes/lec10-3.md index c68221f..1b82f70 100644 --- a/content/research/clusteranalysis/notes/lec10-3.md +++ b/content/research/clusteranalysis/notes/lec10-3.md @@ -1,4 +1,7 @@ -# Voronoi Diagram +--- +title: Voronoi Diagram +showthedate: false +--- A Voronoi diagram is a partitioning of a plan into regions based on distance to points in a specific subset of the plane. diff --git a/content/research/clusteranalysis/notes/lec11-1.md b/content/research/clusteranalysis/notes/lec11-1.md index 99473f2..1a8348d 100644 --- a/content/research/clusteranalysis/notes/lec11-1.md +++ b/content/research/clusteranalysis/notes/lec11-1.md @@ -1,4 +1,8 @@ -# K-means++ +--- +title: K-means++ +showthedate: false +math: true +--- K-means++ is an algorithm for choosing the initial values or seeds for the k-means clustering algorithm. This was proposed as a way of avoiding the sometimes poor clustering found by a standard k-means algorithm. diff --git a/content/research/clusteranalysis/notes/lec11-2.md b/content/research/clusteranalysis/notes/lec11-2.md index e93639b..053353b 100644 --- a/content/research/clusteranalysis/notes/lec11-2.md +++ b/content/research/clusteranalysis/notes/lec11-2.md @@ -1,4 +1,8 @@ -# K-Medoids +--- +title: K-Medoids +showthedate: false +math: true +--- A medoid can be defined as the object of a cluster whose average dissimilarity to all the objects in the cluster is minimal. diff --git a/content/research/clusteranalysis/notes/lec11-3.md b/content/research/clusteranalysis/notes/lec11-3.md index 20bbebb..620848c 100644 --- a/content/research/clusteranalysis/notes/lec11-3.md +++ b/content/research/clusteranalysis/notes/lec11-3.md @@ -1,4 +1,8 @@ -# K-Medians +--- +title: K-Medians +showthedate: false +math: true +--- This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead. @@ -16,4 +20,4 @@ Given an initial set of $k$ medians, the algorithm proceeds by alternating betwe The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm. -The result depends on the initial clusters. It is common to run this multiple times with different starting conditions. \ No newline at end of file +The result depends on the initial clusters. It is common to run this multiple times with different starting conditions. diff --git a/content/research/clusteranalysis/notes/lec12.md b/content/research/clusteranalysis/notes/lec12.md index 83dcee3..8dffcdc 100644 --- a/content/research/clusteranalysis/notes/lec12.md +++ b/content/research/clusteranalysis/notes/lec12.md @@ -1,4 +1,8 @@ -# Introduction to Density Based Clustering +--- +title: Introduction to Density Based Clustering +showthedate: false +math: true +--- In density-based clustering, clusters are defined as areas of higher density than the remainder of the data sets. Objects in more sparse areas are considered to be outliers or border points. This helps discover clusters of arbitrary shape. @@ -31,7 +35,7 @@ A cluster then satisfies two properties: 2. Find the connected components of *core* points on the neighborhood graph, ignoring all non-core points. 3. Assign each non-core point to a nearby cluster if the cluster is an $\epsilon$ (eps) neighbor, otherwise assign it to noise. -###Advantages +### Advantages - Does not require one to specify the number of clusters in the data - Can find arbitrarily shaped clusters @@ -53,4 +57,4 @@ $\epsilon$: Ideally the $k^{th}$ nearest neighbors are at roughly the same dista Example of Run Through -https://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf \ No newline at end of file +https://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf diff --git a/content/research/clusteranalysis/notes/lec2-1.md b/content/research/clusteranalysis/notes/lec2-1.md index c7f420f..8b84cb9 100644 --- a/content/research/clusteranalysis/notes/lec2-1.md +++ b/content/research/clusteranalysis/notes/lec2-1.md @@ -1,4 +1,7 @@ -# Why use different distance measures? +--- +title: Why use different distance measures? +showthedate: false +--- I made an attempt to find out in what situations people use different distance measures. Looking around in the Internet usually produces the results "It depends on the problem" or "I typically just always use Euclidean" @@ -31,4 +34,4 @@ https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a- -Hopefully in this course, we'll discover more properties as to why it makes sense to use different distance measures since it can have a impact on how our clusters are formed. \ No newline at end of file +Hopefully in this course, we'll discover more properties as to why it makes sense to use different distance measures since it can have a impact on how our clusters are formed. diff --git a/content/research/clusteranalysis/notes/lec2-2.md b/content/research/clusteranalysis/notes/lec2-2.md index cb89f91..dbd247a 100644 --- a/content/research/clusteranalysis/notes/lec2-2.md +++ b/content/research/clusteranalysis/notes/lec2-2.md @@ -1,4 +1,8 @@ -# Principal Component Analysis Pt. 1 +--- +title: Principal Component Analysis Pt. 1 +showthedate: false +math: true +--- ## What is PCA? @@ -50,4 +54,4 @@ pcal = function(data) { names(combined_list) = c("Loadings", "Components") return(combined_list) } -``` \ No newline at end of file +``` diff --git a/content/research/clusteranalysis/notes/lec4-2.md b/content/research/clusteranalysis/notes/lec4-2.md index 1cbad20..2da19e2 100644 --- a/content/research/clusteranalysis/notes/lec4-2.md +++ b/content/research/clusteranalysis/notes/lec4-2.md @@ -1,4 +1,8 @@ -# Revisiting Similarity Measures +--- +title: Revisiting Similarity Measures +showthedate: false +math: true +--- ## Manhatten Distance diff --git a/content/research/clusteranalysis/notes/lec4-3.md b/content/research/clusteranalysis/notes/lec4-3.md index b65b103..bd1bfc4 100644 --- a/content/research/clusteranalysis/notes/lec4-3.md +++ b/content/research/clusteranalysis/notes/lec4-3.md @@ -1,4 +1,8 @@ -# Cluster Tendency +--- +title: Cluster Tendency +showthedate: false +math: true +--- This is the assessment of the suitability of clustering. Cluster Tendency determines whether the data has any inherent grouping structure. @@ -37,4 +41,4 @@ Divide each dimension in equal width bins, and count how many points lie in each Do the same for the randomly sampled data -Finally compute how much they differ using the Kullback-Leibler (KL) divergence value. If it differs greatly than we can say that the data is clusterable. \ No newline at end of file +Finally compute how much they differ using the Kullback-Leibler (KL) divergence value. If it differs greatly than we can say that the data is clusterable. diff --git a/content/research/clusteranalysis/notes/lec4.md b/content/research/clusteranalysis/notes/lec4.md index 984a04f..ff9242a 100644 --- a/content/research/clusteranalysis/notes/lec4.md +++ b/content/research/clusteranalysis/notes/lec4.md @@ -1,4 +1,8 @@ -# Principal Component Analysis Part 2: Formal Theory +--- +title: Principal Component Analysis Part 2 - Formal Theory +showthedate: false +math: true +--- ##Properties of PCA @@ -168,4 +172,4 @@ Principal Component Analysis is typically used in dimensionality reduction effor - Exclude principal components where eigenvalues are less than one. - Generate a Scree Plot - Stop when the plot goes from "steep" to "shallow" - - Stop when it essentially becomes a straight line. \ No newline at end of file + - Stop when it essentially becomes a straight line. diff --git a/content/research/clusteranalysis/notes/lec5.md b/content/research/clusteranalysis/notes/lec5.md index d6f874b..cb42cb5 100644 --- a/content/research/clusteranalysis/notes/lec5.md +++ b/content/research/clusteranalysis/notes/lec5.md @@ -1,4 +1,7 @@ -# Introduction to Connectivity Based Models +--- +title: Introduction to Connectivity Based Models +showthedate: false +--- Hierarchical algorithms combine observations to form clusters based on their distance. @@ -32,4 +35,4 @@ Or do you want to based on the farthest observations in each cluster? Farthest n This method is not very robust towards outliers, which will either show up as additional clusters or even cause other clusters to merge depending on the clustering method. -As we go through this section, we will go into detail about the different linkage criterion and other parameters of this model. \ No newline at end of file +As we go through this section, we will go into detail about the different linkage criterion and other parameters of this model. diff --git a/content/research/clusteranalysis/notes/lec6.md b/content/research/clusteranalysis/notes/lec6.md index 2c03e9e..fb9f6a9 100644 --- a/content/research/clusteranalysis/notes/lec6.md +++ b/content/research/clusteranalysis/notes/lec6.md @@ -1,4 +1,8 @@ -# Agglomerative Methods +--- +title: Agglomerative Methods +showthedate: false +math: true +--- ## Single Linkage @@ -87,4 +91,4 @@ Since single linkage joins clusters by the shortest link between them, the techn ## Dendrograms -A **dendrogram** is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. It shows how different clusters are formed at different distance groupings. \ No newline at end of file +A **dendrogram** is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. It shows how different clusters are formed at different distance groupings. diff --git a/content/research/clusteranalysis/notes/lec7.md b/content/research/clusteranalysis/notes/lec7.md index 2fe9e7a..2dec342 100644 --- a/content/research/clusteranalysis/notes/lec7.md +++ b/content/research/clusteranalysis/notes/lec7.md @@ -1,4 +1,8 @@ -# Divisive Methods Pt.1 +--- +title: Divisive Methods Pt.1 +showthedate: false +math: true +--- Divisive methods work in the opposite direction of agglomerative methods. They take one large cluster and successively splits it. @@ -39,7 +43,7 @@ This is sometimes termed *association analysis*. | 1 | a | b | | 0 | c | d | -####Common measures of association +#### Common measures of association $$ |ad-bc| \tag{4.6} @@ -71,4 +75,4 @@ Appealing features of monothetic divisive methods are the easy classification of A further advantage of monothetic divisive methods is that it is obvious which variables produce the split at any stage of the process. -A disadvantage with these methods is that the possession of a particular attribute which is either rare or rarely found in combination with others may take an individual down a different path. \ No newline at end of file +A disadvantage with these methods is that the possession of a particular attribute which is either rare or rarely found in combination with others may take an individual down a different path. diff --git a/content/research/clusteranalysis/notes/lec8.md b/content/research/clusteranalysis/notes/lec8.md index 0c47d8e..41fd7b3 100644 --- a/content/research/clusteranalysis/notes/lec8.md +++ b/content/research/clusteranalysis/notes/lec8.md @@ -1,4 +1,7 @@ -# Divisive Methods Pt 2. +--- +title: Divisive Methods Pt 2. +showthedate: false +--- Recall in the previous section that we spoke about Monothetic and Polythetic methods. Monothetic methods only looks at a single variable at a time while Polythetic looks at multiple variables simultaneously. In this section, we will speak more about polythetic divisive methods. @@ -45,4 +48,4 @@ In most methods of hierarchical clustering this is achieved by a use of an appro - Any valid measure of distance measure can be used - In most cases, the observations themselves are not required, just hte matrix of distances - - This can have the advantage of only having to store a distance matrix in memory as opposed to a n-dimensional matrix. \ No newline at end of file + - This can have the advantage of only having to store a distance matrix in memory as opposed to a n-dimensional matrix. diff --git a/content/research/clusteranalysis/notes/lec9-1.md b/content/research/clusteranalysis/notes/lec9-1.md index d8bb4da..d9217d4 100644 --- a/content/research/clusteranalysis/notes/lec9-1.md +++ b/content/research/clusteranalysis/notes/lec9-1.md @@ -1,4 +1,8 @@ -# CURE and TSNE +--- +title: CURE and TSNE +showthedate: false +math: true +--- ##Clustering Using Representatives (CURE) diff --git a/content/research/clusteranalysis/notes/lec9-2.md b/content/research/clusteranalysis/notes/lec9-2.md index 26c3dcf..51a4ad3 100644 --- a/content/research/clusteranalysis/notes/lec9-2.md +++ b/content/research/clusteranalysis/notes/lec9-2.md @@ -1,4 +1,8 @@ -# Cluster Validation +--- +title: Cluster Validation +showthedate: false +math: true +--- There are multiple approaches to validating your cluster models @@ -69,4 +73,4 @@ Using internal evaluation metrics, you can see the impact of each point by doing `clValid` contains a variety of internal validation measures. -Paper: https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf \ No newline at end of file +Paper: https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf diff --git a/content/research/clusteranalysis/readings.md b/content/research/clusteranalysis/readings.md index 27cc818..487b739 100644 --- a/content/research/clusteranalysis/readings.md +++ b/content/research/clusteranalysis/readings.md @@ -1,4 +1,7 @@ -# Readings for Lectures of Cluster Analysis +--- +title: Readings for Independent Study of Cluster Analysis +showthedate: false +--- ## Lecture 1 Garson Textbook Chapter 3 diff --git a/content/research/clusteranalysis/syllabus.md b/content/research/clusteranalysis/syllabus.md index 002dc8a..91ec93b 100644 --- a/content/research/clusteranalysis/syllabus.md +++ b/content/research/clusteranalysis/syllabus.md @@ -1,4 +1,7 @@ -# Cluster Analysis Spring 2018 +--- +title: Cluster Analysis +showthedate: false +--- ### Distance, Dimensionality Reduction, and Tendency @@ -116,4 +119,4 @@ The schedule above accounts for 14 weeks, so there is a week that is free as a b ## Conclusion -Creating this document got me really excited for this independent study. Feel free to give me feedback :) \ No newline at end of file +Creating this document got me really excited for this independent study. Feel free to give me feedback :) diff --git a/content/research/deepreinforcementlearning/WeeklyProgress.md b/content/research/deepreinforcementlearning/WeeklyProgress.md deleted file mode 100644 index cc9d00c..0000000 --- a/content/research/deepreinforcementlearning/WeeklyProgress.md +++ /dev/null @@ -1,13 +0,0 @@ -## Weekly Progress - -I didn't do the greatest job at writing a progress report every week but here on the page are the ones I did write. - -[January 29 2019](Jan29) - -[February 12 2019](Feb12) - -[February 25 2019](Feb25) - -[March 26 2019](Mar26) - -[April 2 2019](Apr2) \ No newline at end of file diff --git a/content/research/deepreinforcementlearning/WeeklyProgress/Apr2.md b/content/research/deepreinforcementlearning/WeeklyProgress/Apr2.md index 33f81ca..468695e 100644 --- a/content/research/deepreinforcementlearning/WeeklyProgress/Apr2.md +++ b/content/research/deepreinforcementlearning/WeeklyProgress/Apr2.md @@ -1,4 +1,7 @@ -# Progress Report for Week of April 2nd +--- +title: Progress Report for Week of April 2nd +showthedate: false +--- ## Added Video Recording Capability to MinAtar environment @@ -20,6 +23,3 @@ Playing around with the ideas of autoencoders, I wanted a way to discretized my I'll probably look towards implementing what the paper did for my library and move on to a different piece. - - -Guru Indian: 3140 Cowan Blvd, Fredericksburg, VA 22401 \ No newline at end of file diff --git a/content/research/deepreinforcementlearning/WeeklyProgress/Feb12.md b/content/research/deepreinforcementlearning/WeeklyProgress/Feb12.md index 2f419a1..bf63ba4 100644 --- a/content/research/deepreinforcementlearning/WeeklyProgress/Feb12.md +++ b/content/research/deepreinforcementlearning/WeeklyProgress/Feb12.md @@ -1,4 +1,8 @@ -# Weekly Progress Feb 12 +--- +title: Weekly Progress Feb 12 +showthedate: false +math: true +--- ## Finished writing scripts for data collection @@ -60,4 +64,4 @@ The two parameters I use schedulers in normally are: Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. **Layer Normalization.** -It's nicely implemented in PyTorch already so I threw that in for each layer of the network. Reduces the average loss. \ No newline at end of file +It's nicely implemented in PyTorch already so I threw that in for each layer of the network. Reduces the average loss. diff --git a/content/research/deepreinforcementlearning/WeeklyProgress/Feb25.md b/content/research/deepreinforcementlearning/WeeklyProgress/Feb25.md index 5d125fe..71e584e 100644 --- a/content/research/deepreinforcementlearning/WeeklyProgress/Feb25.md +++ b/content/research/deepreinforcementlearning/WeeklyProgress/Feb25.md @@ -1,4 +1,8 @@ -# Weekly Progress for February 25th +--- +title: Weekly Progress for February 25th +showthedate: false +math: true +--- ## Evolutionary Algorithms diff --git a/content/research/deepreinforcementlearning/WeeklyProgress/Jan29.md b/content/research/deepreinforcementlearning/WeeklyProgress/Jan29.md index a203d4f..32f2b8a 100644 --- a/content/research/deepreinforcementlearning/WeeklyProgress/Jan29.md +++ b/content/research/deepreinforcementlearning/WeeklyProgress/Jan29.md @@ -1,4 +1,8 @@ -# Weekly Progress Jan 29 +--- +title: Weekly Progress Jan 29 +showthedate: false +math: true +--- ## 1. Training From Demonstrations @@ -6,13 +10,13 @@ Training from demonstrations is the act of using previous data to help speed up I read two papers on the topic: -[1] Gabriel V. de la Cruz Jr., Yunshu Du, Matthew E. Taylor. **Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning**. +(1) Gabriel V. de la Cruz Jr., Yunshu Du, Matthew E. Taylor. **Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning**. https://arxiv.org/abs/1709.04083 The authors showed how you can speed up the training of a DQN network, especially under problems involving computer vision, if you first train the convolution layers by using a supervised loss between the actions the network would choose and the actions from the demonstration data given a state. -[2] Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. **Deep Q-learning from Demonstrations.** +(2) Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys. **Deep Q-learning from Demonstrations.** https://arxiv.org/abs/1704.03732 @@ -32,9 +36,9 @@ The intuition behind this is that for the loss to be zero, the network would've The main environment I chose to test these algorithms is Acrobot. It is a control theory problem and it has several physics related numbers as input. (Not image based) -I noticed when implementing [1] at least for the non-convolution case, there's no point in trying to train earlier layers. Perhaps I'll try again when I move onto the atari gameplays... +I noticed when implementing (1) at least for the non-convolution case, there's no point in trying to train earlier layers. Perhaps I'll try again when I move onto the atari gameplays... -I decided against following [2] exactly. It's not that I disagree with the approach, but I don't like the need for "expert" data. If you decide to proceed anyways with non-expert data, you need to remember that it is incorporated into the loss function. Which means that you fall risk into learning sub-optimal policies. +I decided against following (2) exactly. It's not that I disagree with the approach, but I don't like the need for "expert" data. If you decide to proceed anyways with non-expert data, you need to remember that it is incorporated into the loss function. Which means that you fall risk into learning sub-optimal policies. In the end, what I decided to do was the following @@ -62,4 +66,4 @@ In our case, the $\mu$ would be the typical weights and biases, and the $\sigma$ The concept is that as the network grows more confident about it's predictions, the variation in the weights start to decrease. This way the exploration is systematic and not something randomly injected like the epsilon-greedy strategy. -The paper describes replacing all your linear densely connected layers with this noisy linear approach. \ No newline at end of file +The paper describes replacing all your linear densely connected layers with this noisy linear approach. diff --git a/content/research/deepreinforcementlearning/WeeklyProgress/Mar26.md b/content/research/deepreinforcementlearning/WeeklyProgress/Mar26.md index bc303eb..620edfa 100644 --- a/content/research/deepreinforcementlearning/WeeklyProgress/Mar26.md +++ b/content/research/deepreinforcementlearning/WeeklyProgress/Mar26.md @@ -1,4 +1,8 @@ -# Progress for Week of March 26 +--- +title: Progress for Week of March 26 +showthedate: false +math: true +--- ## Parallelized Evolutionary Strategies @@ -8,4 +12,4 @@ When the parallel ES class is declared, I start a pool of workers that then gets I started looking through papers on Exploration and am interested in using the theoretical niceness of Count-based exploration in tabular settings and being able to see their affects in the non-tabular case. -""[Unifying Count-Based Exploration and Intrinsic Motivation](https://arxiv.org/abs/1606.01868)" creates a model of a arbitrary density model that follows a couple nice properties we would expect of probabilities. Namely, $P(S) = N(S) / n$ and $P'(S) = (N(S) + 1) / (n + 1)$. Where $N(S)$ represents the number of times you've seen that state, $n$ represents the total number of states you've seen, and $P'(S)$ represents the $P(S)$ after you have seen $S$ another time. With this model, we are able to solve for $N(S)$ and derive what the authors call a *Psuedo-Count*. \ No newline at end of file +""[Unifying Count-Based Exploration and Intrinsic Motivation](https://arxiv.org/abs/1606.01868)" creates a model of a arbitrary density model that follows a couple nice properties we would expect of probabilities. Namely, $P(S) = N(S) / n$ and $P'(S) = (N(S) + 1) / (n + 1)$. Where $N(S)$ represents the number of times you've seen that state, $n$ represents the total number of states you've seen, and $P'(S)$ represents the $P(S)$ after you have seen $S$ another time. With this model, we are able to solve for $N(S)$ and derive what the authors call a *Psuedo-Count*. diff --git a/content/research/deepreinforcementlearning/WeeklyProgress/_index.md b/content/research/deepreinforcementlearning/WeeklyProgress/_index.md new file mode 100644 index 0000000..fb2e132 --- /dev/null +++ b/content/research/deepreinforcementlearning/WeeklyProgress/_index.md @@ -0,0 +1,16 @@ +--- +title: Weekly Progress +showthedate: false +--- + +I didn't do the greatest job at writing a progress report every week but here on the page are the ones I did write. + +[January 29 2019](jan29) + +[February 12 2019](feb12) + +[February 25 2019](feb25) + +[March 26 2019](mar26) + +[April 2 2019](apr2) diff --git a/content/research/deepreinforcementlearning.md b/content/research/deepreinforcementlearning/_index.md similarity index 100% rename from content/research/deepreinforcementlearning.md rename to content/research/deepreinforcementlearning/_index.md diff --git a/content/research/deepreinforcementlearning/abstractspring2019.md b/content/research/deepreinforcementlearning/abstractspring2019.md index 3a8d5f0..332c15f 100644 --- a/content/research/deepreinforcementlearning/abstractspring2019.md +++ b/content/research/deepreinforcementlearning/abstractspring2019.md @@ -1,4 +1,5 @@ --- +title: Abstract showthedate: false --- @@ -12,4 +13,4 @@ QEP: The Q-Value Policy Evaluation Algorithm -*Abstract.* In Reinforcement Learning, sample complexity is often one of many concerns when designing algorithms. This concern outlines the number of interactions with a given environment that an agent needs in order to effectively learn a task. The Reinforcement Learning framework consists of finding a function (the policy) that maps states/scenarios to actions while maximizing the amount of reward from the environment. For example in video games, the reward is often characterized by some score. In recent years a variety of algorithms came out falling under the categories of Value-based methods and Policy-based methods. Value-based methods create a policy by approximating how much reward an agent is expected to receive if it performs the best actions from a given state. It is then common to choose the actions that maximizes such values. Meanwhile, in Policy-based methods, the policy function produces probabilities that an agent performs each action given a state and this is then optimized for the maximum reward. As such, Value-based methods produce deterministic policies while policy-based methods produce stochastic/probabilistic policies. Empirically, Value-based methods have lower sample complexity than Policy-based methods. However, in decision making not every situation has a best action associated with it. This is mainly due to the fact that real world environments are dynamic in nature and have confounding variables affecting the result. The QEP Algorithm combines both the Policy-based methods and Value-based methods by changing the policy's optimization scheme to involve approximate value functions. We have shown that this combines the benefits of both methods so that the sample complexity is kept low while maintaining a stochastic policy. \ No newline at end of file +*Abstract.* In Reinforcement Learning, sample complexity is often one of many concerns when designing algorithms. This concern outlines the number of interactions with a given environment that an agent needs in order to effectively learn a task. The Reinforcement Learning framework consists of finding a function (the policy) that maps states/scenarios to actions while maximizing the amount of reward from the environment. For example in video games, the reward is often characterized by some score. In recent years a variety of algorithms came out falling under the categories of Value-based methods and Policy-based methods. Value-based methods create a policy by approximating how much reward an agent is expected to receive if it performs the best actions from a given state. It is then common to choose the actions that maximizes such values. Meanwhile, in Policy-based methods, the policy function produces probabilities that an agent performs each action given a state and this is then optimized for the maximum reward. As such, Value-based methods produce deterministic policies while policy-based methods produce stochastic/probabilistic policies. Empirically, Value-based methods have lower sample complexity than Policy-based methods. However, in decision making not every situation has a best action associated with it. This is mainly due to the fact that real world environments are dynamic in nature and have confounding variables affecting the result. The QEP Algorithm combines both the Policy-based methods and Value-based methods by changing the policy's optimization scheme to involve approximate value functions. We have shown that this combines the benefits of both methods so that the sample complexity is kept low while maintaining a stochastic policy. diff --git a/content/research/reinforcementlearning.md b/content/research/reinforcementlearning.md index 2dff393..146588b 100644 --- a/content/research/reinforcementlearning.md +++ b/content/research/reinforcementlearning.md @@ -3,7 +3,6 @@ Title: Reinforcement Learning Description: The study of optimally mapping situations to actions --- -# Reinforcement Learning Reinforcement learning is the art of analyzing situations and mapping them to actions in order to maximize a numerical reward signal. In this independent study, I as well as Dr. Stephen Davies, will explore the Reinforcement Learning problem and its subproblems. We will go over the bandit problem, markov decision processes, and discover how best to translate a problem in order to **make decisions**. @@ -38,8 +37,6 @@ Code will occasionally be written to solidify the learning material and to act a [Github Link](https://github.com/brandon-rozek/ReinforcementLearning) -Specifically, if you want to see agents I've created to solve some OpenAI environments, take a look at this specific folder in the Github Repository - -[Github Link](https://github.com/Brandon-Rozek/ReinforcementLearning/tree/master/agents) +Specifically, if you want to see agents I've created to solve some OpenAI environments, take a look at this specific folder in the Github Repository. [Github Link](https://github.com/Brandon-Rozek/ReinforcementLearning/tree/master/agents) diff --git a/content/research/reinforcementlearning/notes.md b/content/research/reinforcementlearning/notes.md index 9b84aa1..2b2a93e 100644 --- a/content/research/reinforcementlearning/notes.md +++ b/content/research/reinforcementlearning/notes.md @@ -1,4 +1,7 @@ -# Lecture Notes for Reinforcement Learning +--- +title: Lecture Notes for Reinforcement Learning +showthedate: false +--- [Chapter 1: An Introduction](intro) diff --git a/content/research/reinforcementlearning/notes/bandits.md b/content/research/reinforcementlearning/notes/bandits.md index 2fabed1..7c098d9 100644 --- a/content/research/reinforcementlearning/notes/bandits.md +++ b/content/research/reinforcementlearning/notes/bandits.md @@ -1,4 +1,8 @@ -# Chapter 2: Multi-armed Bandits +--- +title: Chapter 2 - Multi-armed Bandits +showthedate: false +math: true +--- Reinforcement learning *evaluates* the actions taken rather than accepting $instructions$ of the correct actions. This creates the need for active exploration. @@ -141,4 +145,4 @@ This is an example of an *associative search* task, so called because it involve If actions are allowed to affect the next situation as well as the reward, then we have the full reinforcement learning problem. This will be presented in the next chapter of the book with its ramifications appearing throughout the rest of the book. -![1536321791927](/home/rozek/Pictures/1536321791927.png) \ No newline at end of file +![1536321791927](/home/rozek/Pictures/1536321791927.png) diff --git a/content/research/reinforcementlearning/notes/dynamic.md b/content/research/reinforcementlearning/notes/dynamic.md index 708b258..55d79bb 100644 --- a/content/research/reinforcementlearning/notes/dynamic.md +++ b/content/research/reinforcementlearning/notes/dynamic.md @@ -1,4 +1,8 @@ -# Chapter 4: Dynamic Programming +--- +title: Chapter 4 - Dynamic Programming +showthedate: false +math: true +--- Dynamic programming refers to a collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a Markov decision process (MDP). diff --git a/content/research/reinforcementlearning/notes/intro.md b/content/research/reinforcementlearning/notes/intro.md index 5b378ba..264e570 100644 --- a/content/research/reinforcementlearning/notes/intro.md +++ b/content/research/reinforcementlearning/notes/intro.md @@ -1,4 +1,7 @@ -# Introduction to Reinforcement Learning Day 1 +--- +title: Introduction to Reinforcement Learning Day 1 +showthedate: false +--- Recall that this course is based on the book -- @@ -63,4 +66,4 @@ Most of the time in a reinforcement learning algorithm, we move greedily, select -Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment. \ No newline at end of file +Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment. diff --git a/content/research/reinforcementlearning/notes/mcmethods.md b/content/research/reinforcementlearning/notes/mcmethods.md index 51543e8..a2b8fa6 100644 --- a/content/research/reinforcementlearning/notes/mcmethods.md +++ b/content/research/reinforcementlearning/notes/mcmethods.md @@ -1,4 +1,8 @@ -# Chapter 5: Monte Carlo Methods +--- +title: Chapter 5 - Monte Carlo Methods +showthedate: false +math: true +--- Monte Carlo methods do not assume complete knowledge of the environment. They require only *experience* which is a sample sequence of states, actions, and rewards from actual or simulated interaction with an environment. diff --git a/content/research/reinforcementlearning/notes/mdp.md b/content/research/reinforcementlearning/notes/mdp.md index cde47b2..6e00452 100644 --- a/content/research/reinforcementlearning/notes/mdp.md +++ b/content/research/reinforcementlearning/notes/mdp.md @@ -1,4 +1,8 @@ -# Chapter 3: Finite Markov Decision Processes +--- +title: Chapter 3 - Finite Markov Decision Processes +showthedate: false +math: true +--- Markov Decision processes are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to trade-off immediate and delayed reward. Whereas in bandit problems we estimated the value of $q_*(a)$ of each action $a$, in MDPs we estimate the value of $q_*(s, a)$ of each action $a$ in state $s$.