Added old tildasite as an archive

This commit is contained in:
Brandon Rozek 2020-01-15 23:07:02 -05:00
parent b8c47a4d7d
commit 442995995e
117 changed files with 16909 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 160 KiB

89
static/~brozek/index.html Normal file
View file

@ -0,0 +1,89 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="This is my University Tilda space">
<title>Home | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem active">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Brandon Rozek's Tilda Space</h1>
<h2>Welcome</h2>
<p>My main website is <a href="https://brandonrozek.com">https://brandonrozek.com</a>. Though I like the customization that Wordpress allows, it does come with additional overhead.</p>
<p>Therefore, I decided to use the Tilda space offered by the Computer Science department to host content pertaining to my Academic Life.</p>
<p>As such, my research and other class documents will appear here.</p>
<p>For the curious, this website was built with <a href="http://picocms.org">Pico CMS</a> using the <a href="https://github.com/lostkeys/Bits-and-Pieces-Theme-for-Pico">BitsAndPieces</a> theme.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,382 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Bayesian Statistics</h1>
<h2>Rules of Probability</h2>
<p>Probabilities must be between zero and one, i.e., $0≤P(A)≤1$ for any event A.</p>
<p>Probabilities add to one, i.e., $\sum{P(X_i)} = 1$</p>
<p>The complement of an event, $A^c$, denotes that the event did not happen. Since probabilities must add to one, $P(A^c) = 1 - P(A)$</p>
<p>If A and B are two events, the probability that A or B happens (this is an inclusive or) is the probability of the union of the events:
$$
P(A \cup B) = P(A) + P(B) - P(A\cap B)
$$
where $\cup$ represents union (&quot;or&quot;) and $\cap$ represents intersection (&quot;and&quot;). If a set of events $A<em>i$ are mutually exclusive (only one event may happen), then
$$
P(\cup</em>{i=1}^n{A<em>i}) = \sum</em>{i=1}^n{P(A_i)}
$$</p>
<h2>Odds</h2>
<p>The odds for event A, denoted $\mathcal{O}(A)$ is defined as $\mathcal{O}(A) = P(A)/P(A^c)$ </p>
<p>This is the probability for divided by probability against the event</p>
<p>From odds, we can also compute back probabilities
$$
\frac{P(A)}{P(A^c)} = \mathcal{O}(A)
$$</p>
<p>$$
\frac{P(A)}{1-P(A)} = \mathcal{O}(A)
$$</p>
<p>$$
\frac{1 -P(A)}{P(A)} = \frac{1}{\mathcal{O}(A)}
$$</p>
<p>$$
\frac{1}{P(A)} - 1 = \frac{1}{\mathcal{O}(A)}
$$</p>
<p>$$
\frac{1}{P(A)} = \frac{1}{\mathcal{O}(A)} + 1
$$</p>
<p>$$
\frac{1}{P(A)} = \frac{1 + \mathcal{O}(A)}{\mathcal{O}(A)}
$$</p>
<p>$$
P(A) = \frac{\mathcal{O}(A)}{1 + \mathcal{O}(A)}
$$</p>
<h2>Expectation</h2>
<p>The expected value of a random variable X is a weighted average of values X can take, with weights given by the probabilities of those values.
$$
E(X) = \sum_{i=1}^n{x_i * P(X=x_i)}
$$</p>
<h2>Frameworks of probability</h2>
<p>Classical -- Outcomes that are equally likely have equal probabilities</p>
<p>Frequentist -- In an infinite sequence of events, what is the relative frequency</p>
<p>Bayesian -- Personal perspective (your own measure of uncertainty)</p>
<p>In betting, one must make sure that all the rules of probability are followed. That the events are &quot;coherent&quot;, otherwise one might construct a series of bets where you're guaranteed to lose money. This is referred to as a Dutch book.</p>
<h2>Conditional probability</h2>
<p>$$
P(A|B) = \frac{P(A\cup B)}{P(B)}
$$</p>
<p>Where $A|B$ denotes &quot;A given B&quot;</p>
<p>Example from lecture:</p>
<p>Suppose there are 30 students, 9 of which are female. From the 30 students, 12 are computer science majors. 4 of those 12 computer science majors are female
$$
P(Female) = \frac{9}{30} = \frac{3}{10}
$$</p>
<p>$$
P(CS) = \frac{12}{30} = \frac{2}{5}
$$</p>
<p>$$
P(F\cap CS) = \frac{4}{30} = \frac{2}{15}
$$</p>
<p>$$
P(F|CS) = \frac{P(F \cap CS)}{P(CS)} = \frac{2/15}{2/5} = \frac{1}{3}
$$</p>
<p>An intuitive way to think about a conditional probability is that we're looking at a subsegment of the original population, and asking a probability question within that segment
$$
P(F|CS^c) = \frac{P(F\cap CS^c)}{PS(CS^c)} = \frac{5/30}{18/30} = \frac{5}{18}
$$
The concept of independence is when one event does not depend on another.
$$
P(A|B) = P(A)
$$
It doesn't matter that B occurred.</p>
<p>If two events are independent then the following is true
$$
P(A\cap B) = P(A)P(B)
$$
This can be derived from the conditional probability equation.</p>
<h2>Conditional Probabilities in terms of other conditional</h2>
<p>Suppose we don't know what $P(A|B)$ is but we do know what $P(B|A)$ is. We can then rewrite $P(A|B)$ in terms of $P(B|A)$
$$
P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|A^c)P(A^c)}
$$
Let's look at an example of an early test for HIV antibodies known as the ELISA test.
$$
P(+ | HIV) = 0.977
$$</p>
<p>$$
P(- | NO_HIV) = 0.926
$$</p>
<p>As you can see over 90% of the time, this test was accurate.</p>
<p>The probability of someone in North America having this disease was $P(HIV) = .0026$</p>
<p>Now let's consider the following problem: the probability of having the disease given that they tested positive $P(HIV | +)$
$$
P(HIV|+) = \frac{P(+|HIV)P(HIV)}{P(+|HIV)P(HIV) + P(+|NO_HIV){P(NO_HIV)}}
$$</p>
<p>$$
P(HIV|+) = \frac{(.977)(.0026)}{(.977)(.0026) + (1-.977)(1-.0026)}
$$</p>
<p>$$
P(HIV|+) = 0.033
$$</p>
<p>This example looked at Bayes Theorem for the two event case. We can generalize it to n events through the following formula
$$
P(A|B) = \frac{P(B|A_1){(A<em>1)}}{\sum</em>{i=1}^{n}{P(B|A_i)}P(A_i)}
$$</p>
<h2>Bernoulli Distribution</h2>
<p>~ means 'is distributed as'</p>
<p>We'll be first studying the Bernoulli Distribution. This is when your event has two outcomes, which is commonly referred to as a success outcome and a failure outcome. The probability of success is $p$ which means the probability of failure is $(1-p)$
$$
X \sim B(p)
$$</p>
<p>$$
P(X = 1) = p
$$</p>
<p>$$
P(X = 0) = 1-p
$$</p>
<p>The probability of a random variable $X$ taking some value $x$ given $p$ is
$$
f(X = x | p) = f(x|p) = p^x(1-p)^{1 - x}I
$$
Where $I$ is the Heavenside function</p>
<p>Recall the expected value
$$
E(X) = \sum_{x_i}{x_iP(X=x_i)} = (1)p + (0)(1-p) = p
$$
We can also define the variance of Bernoulli
$$
Var(X) = p(1-p)
$$</p>
<h2>Binomial Distribution</h2>
<p>The binomial distribution is the sum of n <em>independent</em> Bernoulli trials
$$
X \sim Bin(n, p)
$$</p>
<p>$$
P(X=x|p) = f(x|p) = {n \choose x} p^x (1-p)^{n-x}
$$</p>
<p>$n\choose x$ is the combinatoric term which is defined as
$$
{n \choose x} = \frac{n!}{x! (n - x)!}
$$</p>
<p>$$
E(X) = np
$$</p>
<p>$$
Var(X) = np(1-p)
$$</p>
<h2>Uniform distribution</h2>
<p>Let's say X is uniformally distributed
$$
X \sim U[0,1]
$$</p>
<p>$$
f(x) = \left{
\begin{array}{lr}
1 &amp; : x \in [0,1]\
0 &amp; : otherwise
\end{array}
\right.
$$</p>
<p>$$
P(0 &lt; x &lt; \frac{1}{2}) = \int_0^\frac{1}{2}{f(x)dx} = \int_0^\frac{1}{2}{dx} = \frac{1}{2}
$$</p>
<p>$$
P(0 \leq x \leq \frac{1}{2}) = \int_0^\frac{1}{2}{f(x)dx} = \int_0^\frac{1}{2}{dx} = \frac{1}{2}
$$</p>
<p>$$
P(x = \frac{1}{2}) = 0
$$</p>
<h2>Rules of probability density functions</h2>
<p>$$
\int_{-\infty}^\infty{f(x)dx} = 1
$$</p>
<p>$$
f(x) \ge 0
$$</p>
<p>$$
E(X) = \int_{-\infty}^\infty{xf(x)dx}
$$</p>
<p>$$
E(g(X)) = \int{g(x)f(x)dx}
$$</p>
<p>$$
E(aX) = aE(X)
$$</p>
<p>$$
E(X + Y) = E(X) + E(Y)
$$</p>
<p>If X &amp; Y are independent
$$
E(XY) = E(X)E(Y)
$$</p>
<h2>Exponential Distribution</h2>
<p>$$
X \sim Exp(\lambda)
$$</p>
<p>Where $\lambda$ is the average unit between observations
$$
f(x|\lambda) = \lambda e^{-\lambda x}
$$</p>
<p>$$
E(X) = \frac{1}{\lambda}
$$</p>
<p>$$
Var(X) = \frac{1}{\lambda^2}
$$</p>
<h2>Uniform (Continuous) Distribution</h2>
<p>$$
X \sim [\theta_1, \theta_2]
$$</p>
<p>$$
f(x|\theta_1,\theta_2) = \frac{1}{\theta_2 - \theta<em>1}I</em>{\theta_1 \le x \le \theta_2}
$$</p>
<h2>Normal Distribution</h2>
<p>$$
X \sim N(\mu, \sigma^2)
$$</p>
<p>$$
f(x|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}
$$</p>
<p>$$
E(X) = \mu
$$</p>
<p>$$
Var(X) = \sigma^2
$$</p>
<h2>Variance</h2>
<p>Variance is the squared distance from the mean
$$
Var(X) = \int_{-\infty}^\infty {(x - \mu)^2f(x)dx}
$$</p>
<h2>Geometric Distribution (Discrete)</h2>
<p>The geometric distribution is the number of trails needed to get the first success, i.e, the number of Bernoulli events until a success is observed.
$$
X \sim Geo(p)
$$</p>
<p>$$
P(X = x|p) = p(1-p)^{x-1}
$$</p>
<p>$$
E(X) = \frac{1}{p}
$$</p>
<h2>Multinomial Distribution (Discrete)</h2>
<p>Multinomial is like a binomial when there are more than two possible outcomes.</p>
<p>$$
f(x_1,...,x_k|p_1,...,p_k) = \frac{n!}{x_1! ... x_k!}p_1^{x_1}...p_k^{x_k}
$$</p>
<h2>Poisson Distribution (Discrete)</h2>
<p>The Poisson distribution is used for counts. The parameter $\lambda &gt; 0$ is the rate at which we expect to observe the thing we are counting.
$$
X \sim Pois(\lambda)
$$</p>
<p>$$
P(X=x|\lambda) = \frac{\lambda^xe^{-\lambda}}{x!}
$$</p>
<p>$$
E(X) = \lambda
$$</p>
<p>$$
Var(X) = \lambda
$$</p>
<h2>Gamma Distribution (Continuous)</h2>
<p>If $X_1, X_2, ..., X_n$ are independent and identically distributed Exponentials,waiting time between success events, then the total waiting time for all $n$ events to occur will follow a gamma distribution with shape parameter $\alpha = n$ and rate parameter $\beta = \lambda$
$$
Y \sim Gamma(\alpha, \beta)
$$</p>
<p>$$
f(y|\alpha,\beta) = \frac{\beta^n}{\Gamma(\alpha)}y^{n-1}e^{-\beta y}I_{y\ge0}(y)
$$</p>
<p>$$
E(Y) = \frac{\alpha}{\beta}
$$</p>
<p>$$
Var(Y) = \frac{\alpha}{\beta^2}
$$</p>
<p>Where $\Gamma(x)$ is the gamma function. The exponential distribution is a special case of the gamma distribution with $\alpha = 1$. As $\alpha$ increases, the gamma distribution more closely resembles the normal distribution.</p>
<h2>Beta Distribution (Continuous)</h2>
<p>The beta distribution is used for random variables which take on values between 0 and 1. For this reason, the beta distribution is commonly used to model probabilities.
$$
X \sim Beta(\alpha, \beta)
$$</p>
<p>$$
f(x|\alpha,\beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{n -1}(1 - x)^{\beta - 1}I_{{0 &lt; x &lt; 1}}
$$</p>
<p>$$
E(X) = \frac{\alpha}{\alpha + \beta}
$$</p>
<p>$$
Var(X) = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha+\beta+1)}
$$</p>
<p>The standard uniform distribution is a special case of the beta distribution with $\alpha = \beta = 1$</p>
<h2>Bayes Theorem for continuous distribution</h2>
<p>$$
f(\theta|y) = \frac{f(y|\theta)f(\theta)}{\int{f(y|\theta)f(\theta)d\theta}}
$$</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,510 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<p>Under the frequentest paradigm, you view the data as a random sample from some larger, potentially hypothetical population. We can then make probability statements i.e, long-run frequency statements based on this larger population.</p>
<h2>Coin Flip Example (Central Limit Theorem)</h2>
<p>Let's suppose we flip a coin 100 times and we get 44 heads and 56 tails.
$$
n = 100
$$
We can view these 100 flips as a random sample from a much larger infinite hypothetical population of flips from this coin.</p>
<p>Let's say that each flip follows a Bournelli distribution with some probability p. In this case $p$ is unknown, but we're assuming it's fixed because we have a particular physical coin.</p>
<p>We can ask what's our best estimate of the probability of getting a head, or an estimate of $p$. We can also ask about how confident we are about that estimate.</p>
<p>Let's start by applying the Central Limit Theorem. The Central Limit Theorem states that the sum of 100 flips will follow approximately a Normal distribution with mean 100p and variance 100p(1-p)
$$
\sum^n_{i=1}{x_i} \sim N(np, np(1-p))
$$</p>
<p>$$
\sum_{i = 1}^{100}{x_i} \sim N(100p, 100p(1-p))
$$</p>
<p>By the properties of a Normal distribution, 95% of the time we'll get a result within 1.96 standard deviations of the mean. Our estimate is $100\hat{p}$ and our error is 1.96 times the standard deviation.
$$
n\hat{p} \pm 1.96\sqrt{n\hat{p}(1-\hat{p})}
$$</p>
<p>$$
100\hat{p} \pm 1.96\sqrt{100\hat{p}(1-\hat{p})}
$$</p>
<p>This is referred to as a Confidence Interval. Confidence Intervals are commonly abbreviated as CI. In our example $\hat{p} = \frac{44}{n} = \frac{44}{100}$. Therefore, the 95% Confidence Interval in the true number of heads after flipping a coin 100 times is:
$$
100(.44) \pm 1.96\sqrt{100(.44)(1-.44)}
$$</p>
<p>$$
44 \pm 1.96\sqrt{44(.56)}
$$</p>
<p>$$
44\pm 1.96\sqrt{24.64}
$$</p>
<p>$$
(34.27, 53.73)
$$</p>
<p>We can divide this by 100 to get the 95% Confidence Interval for $p$
$$
(0.34, 0.53)
$$
Let's step back and ask, what does it mean when I say we're 95% confident?</p>
<p>Under the frequentest paradigm, what this means is we have to think back to our infinite hypothetical sequence of events. So if we were to repeat this trial an infinite number of times, or an arbitrary large number of times. Each time we create a confidence interval based on the data we observe, than on average 95% of the intervals we make will contain the true value of p.</p>
<p>On the other hand, we might want to know something about this particular interval. Does this interval contain the true p? What's the probability that this interval contains a true p? Well, we don't know for this particular interval. But under the frequentest paradigm, we're assuming that there is a fixed answer for p. Either p is in that interval or it's not in that interval. The probability that p is in that interval is either 0 or 1.</p>
<h2>Example: Heart Attack Patients (Maximum Likelihood)</h2>
<p>Consider a hospital where 400 patients are admitted over a month for heart attacks, and a month later 72 of them have died and 328 of them have survived.</p>
<p>We can ask, what's our estimate of the mortality rate?</p>
<p>Under the frequentest paradigm, we must first establish our reference population. What do we think our reference population is here? One possibility is we could think about heart attack patients in the region.</p>
<p>Another reference population we can think about is heart attack patients that are admitted to this hospital, but over a longer period of time. </p>
<p>Both of these might be reasonable attempts, but in this case our actual data are not random sample from either of those populations. We could sort of pretend they are and move on, or we could also try to think harder about what a random sample situation might be. We can think about all the people in the region who might possibly have a heart attack and might possibly get admitted to this hospital.</p>
<p>It's a bit of an odd hypothetical situation, and so there are some philosophical issues with the setup of this whole problem with the frequentest paradigm, In any case, let's forge forward and think about how we might do some estimation.</p>
<p>Moving on, we can say each patient comes from a Bernoulli distribution with an unknown parameter $\theta$.
$$
Y_i \sim B(\theta)
$$</p>
<p>$$
P(Y_i = 1) = \theta
$$</p>
<p>In this case, let's call the &quot;success&quot; a mortality. </p>
<p>The probability density function for the entire set of data we can write in vector form. Probability of all the Y's take some value little y given a value of theta.
$$
P(Y = y | \theta) = P(Y_1 = y_1, Y_2, = y_2,\dots, Y_n=y_n|\theta)
$$
<em>Since we're viewing these as independent events</em>, then the probability of each of these individual ones we can write in product notation.
$$
P(Y = y | \theta) = P(Y_1 = y_1|\theta)\dots P(Y_n = y_n | \theta)
$$</p>
<p>$$
P(Y = y | \theta) = \prod_{i = 1}^n{P(Y_i =y<em>i | \theta)} = \prod</em>{i = 1}^n{(\theta^{y_i}(1-\theta)^{1-y_i})}
$$</p>
<p>This is the probability of observing the actual data that we collected, conditioned on a value of the parameter $\theta$. We can now think about this expression as a function of theta. This is a concept of a likelihood.
$$
L(\theta|y) = \prod_{i = 1}^n{(\theta^{y_i}(1-\theta)^{1-y_i})}
$$
It looks like the same function, but here it is a function of y given $\theta$. And now we're thinking of it as a function of $\theta$ given y.</p>
<p>This is not a probability distribution anymore, but it is still a function for $\theta$.</p>
<p>One we to estimate $\theta$ is that we choose the $\theta$ that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed.</p>
<p>This is referred to as the maximum likelihood estimate (MLE),</p>
<p>We're trying to find the $\theta$ that maximizes the likelihood.</p>
<p>In practice, it's usually easier to maximize the natural logarithm of the likelihood, commonly referred to as the log likelihood.
$$
\mathcal{L}(\theta) = \log{L(\theta|y)}
$$
Since the logarithm is a monotonic function, if we maximize the logarithm of the function, we also maximize the original function.
$$
\mathcal{L(\theta)} = \log{\prod_{i = 1}^n{(\theta^{y_i}(1-\theta)^{1-y_i})}}
$$</p>
<p>$$
\mathcal{L}(\theta) = \sum_{i = 1}^n{\log{(\theta^{y_i}(1-\theta)^{1-y_i})}}
$$</p>
<p>$$
\mathcal{L}(\theta) = \sum_{i = 1}^n{(\log{(\theta^{y_i}}) + \log{(1-\theta)^{1-y_i}})}
$$</p>
<p>$$
\mathcal{L}(\theta) = \sum_{i = 1}^n{(y_i\log{\theta} + (1 - y_i)\log{(1-\theta)})}
$$</p>
<p>$$
\mathcal{L}(\theta) = \log{\theta}\sum_{i = 1}^n{y<em>i} + \log{(1-\theta)}\sum</em>{i = 1}^n{(1-y_i)}
$$</p>
<p>How do we find the theta that maximizes this function? Recall from calculus that we can maximize a function by taking the derivative and setting it equal to 0.
$$
\mathcal{L}^\prime(\theta) = \frac{1}{\theta}\sum{y_i} - \frac{1}{1-\theta}\sum{(1 - y_i)}
$$
Now we need to set it equal to zero and solve for $\hat{\theta}$
$$
\frac{\sum{y_i}}{\hat{\theta}} = \frac{\sum{(1-y_i)}}{1-\hat{\theta}}
$$</p>
<p>$$
\hat{\theta}\sum{(1-y_i)} = (1-\hat{\theta})\sum{y_i}
$$</p>
<p>$$
\hat{\theta}\sum{(1-y_i)} + \hat{\theta}\sum{y_i} = \sum{y_i}
$$</p>
<p>$$
\hat{\theta}(\sum^n{(1 - y_i + y_i)}) = \sum{y_i}
$$</p>
<p>$$
\hat{\theta} = \frac{1}{n}\sum{y_i} = \hat{p} = \frac{72}{400} = 0.18
$$</p>
<p>Maximum likelihood estimators have many desirable mathematical properties. They're unbiased, they're consistent, and they're invariant.</p>
<p>In general, under certain regularity conditions, we can say that the MLE is approximately normally distributed with mean at the true value of $\theta$ and the variance $\frac{1}{I(\hat{\theta})}$ where $I(\hat{\theta})$ is the Fisher information at the value of $\hat{\theta}$. The Fisher information is a measure of how much information about $\theta$ is in each data point.
$$
\hat{\theta} \sim N(\theta, \frac{1}{I(\hat{\theta})})
$$
For a Bernoulli random variable, the Fisher information turns out to be
$$
I(\theta) = \frac{1}{\theta(1-\theta)}
$$
So the information is larger, when theta is near zero or near one, and it's the smallest when theta is near one half.</p>
<p>This makes sense, because if you're flipping a coin, and you're getting a mix of heads and tails, that tells you a little bit less than if you're getting nearly all heads or nearly all tails.</p>
<h2>Exponential Likelihood Example</h2>
<p>Let's say $X_i$ are distributed so
$$
X<em>i \sim Exp(\lambda)
$$
Let's say the data is independent and identically distributed, therefore making the overall density function
$$
f(x|\lambda) = \prod</em>{i = 1}^n{\lambda e^{-\lambda x_i}}
$$</p>
<p>$$
f(x|\lambda) = \lambda^ne^{-\lambda \sum{x_i}}
$$</p>
<p>Now the likelihood function is
$$
L(\lambda|x) = \lambda^n e^{-\lambda \sum{x_i}}
$$</p>
<p>$$
\mathcal{L}(\lambda) = n\ln{\lambda} - \lambda\sum{x_i}
$$</p>
<p>Taking the derivative
$$
\mathcal{L}^\prime(\lambda) = \frac{n}{\lambda} - \sum{x_i}
$$
Setting this equal to zero
$$
\frac{n}{\hat{\lambda}} =\sum{x_i}
$$</p>
<p>$$
\hat{\lambda} = \frac{n}{\sum{x_i}} = \frac{1}{\bar{x}}
$$</p>
<h2>Uniform Distribution</h2>
<p>$$
X_i \sim U[0, \theta]
$$</p>
<p>$$
f(x|\theta) = \prod<em>{i = 1}^n{\frac{1}{\theta}I</em>{0 \le x_i \le \theta}}
$$</p>
<p>Combining all the indicator functions, for this to be a 1, each of these has to be true. These are going to be true if all the observations are bigger than 0, as in the minimum of the x is bigger than or equal to 0. The maximum of the x's is also less than or equal to $\theta$.
$$
L(\theta|x) = \theta^{-1} I_{0\le min(x_i) \le max(x_i) \le \theta}
$$</p>
<p>$$
L^\prime(\theta) = -n\theta^{-(n + 1)}I_{0 \le min(x_i) \le max(x_i)\le \theta}
$$</p>
<p>So now we can ask, can we set this equal to zero and solve for $\theta$? Well it turns out, this is not equal to zero for any $\theta$ positive value. We need $\theta$ to be strictly larger than zero.</p>
<p>However, we can also note that for $\theta$ positive, this will always be negative. The derivative is negative, that says this is a decreasing function. So this funciton will be maximized when we pick $\theta$ as small as possible. What's the smallest possible value of $\theta$ we can pick? Well we need in particular for $\theta$ to be larger than all of the $X_i$ . And so, the maximum likelihood estimate is the maximum of $X_i$
$$
\hat{\theta} = max(x_i)
$$</p>
<h2>Products of Indicator Functions</h2>
<p>Because 0 * 1 = 0, the product of indicator functions can be combined into a single indicator function with a modified condition.</p>
<p><strong>Example</strong>: $I<em>{x &lt; 5} * I</em>{x \ge 0} = I_{0 \le x &lt; 5}$ </p>
<p><strong>Example</strong>: $\prod<em>{i = 1}^n{I</em>{x<em>i &lt; 2}} = I</em>{x<em>i &lt; 2 for all i} = I</em>{max(x_i) &lt; 2}$</p>
<h2>Introduction to R</h2>
<p>R has some nice functions that one can use for analysis</p>
<p><code>mean(z)</code> gives the mean of some row vector $z$</p>
<p><code>var(z)</code> reports the variance of some row vector</p>
<p><code>sqrt(var(z))</code> gives the standard deviation of some row vector</p>
<p><code>seq(from=0.1, to = 0.9, by = 0.1)</code> creates a vector that starts from $0.1$ and goes to $0.9$ incrementing by $0.1$</p>
<pre><code class="language-R">seq(from=0.1, to = 0.9, by = 0.1)
[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
seq(1, 10)
[1] 1 2 3 4 5 6 7 8 9 10</code></pre>
<p><code>names(x)</code> gives the names of all the columns in the dataset.</p>
<pre><code class="language-R">names(trees)
[1] "Girth" "Height" "Volume"</code></pre>
<p><code>hist(x)</code> provides a histogram based on a vector</p>
<p>The more general <code>plot</code> function tries to guess at which type of plot to make. Feeding it two numerical vectors will make a scatter plot.</p>
<p>The R function <code>pairs</code> takes in a data frame and tries to make all possible Pairwise scatterplots for the dataset. </p>
<p>The <code>summary</code> command gives the five/six number summary (minimum, first quartile, median, mean, third quartile, maximum)</p>
<h2>Plotting the likelihood function in R</h2>
<p>Going back to the hospital example</p>
<pre><code class="language-R">## Likelihood function
likelihood = function(n, y, theta) {
return(theta^y * (1 - theta)^(n - y))
}
theta = seq(from = 0.01, to = 0.99, by = 0.01)
plot(theta, likelihood(400, 72, theta))</code></pre>
<p>You can also do this with log likelihoods. This is typically more numerically stable to compute</p>
<pre><code class="language-R">loglike = function(n, y, theta) {
return(y * log(theta) + (n - y) * log(1 - theta))
}
plot(theta, loglike(400, 72, theta))</code></pre>
<p>Having these plotted as points makes it difficult to see, let's plot it as lines</p>
<pre><code class="language-R">plot(theta, loglike(400, 72, theta), type = "l")</code></pre>
<h2>Cumulative Distribution Function</h2>
<p>The cumulative distribution function (CDF) exists for every distribution. We define it as $F(x) = P(X \le x)$ for random variable $X$. </p>
<p>If $X$ is discrete-valued, then the CDF is computed with summation $F(x) = \sum_{t = -\infty}^x {f(t)}$. where $f(t) = P(X = t)$ is the probability mass function (PMF) which we've already seen.</p>
<p>If $X$ is continuous, the CDF is computed with an integral $F(x) = \int_{-\infty}^x{f(t)dt}$</p>
<p>The CDF is convenient for calculating probabilities of intervals. Let $a$ and $b$ be any real numbers with $a &lt; b$. Then the probability that $X$ falls between $a$ and $b$ is equal to $P(a &lt; X &lt; b) = P(X \le b) - P(X \le a) = F(b) - F(a)$ </p>
<h2>Quantile Function</h2>
<p>The CDF takes a value for a random variable and returns a probability. Suppose instead we start with a number between $0$ and $1$, which we call $p$, and we wish to find a value $x$ so that $P(X \le x) = p$. The value $x$ which satisfies this equation is called the $p$ quantile. (or $100p$ percentile) of the distribution of $X$.</p>
<h2>Probability Distributions in R</h2>
<p>Each of the distributions introduced in Lesson 3 have convenient functions in R which allow you to evaluate the PDF/PMF, CDF, and quantile functions, as well as generate random samples from the distribution. To illustrate, Table I list these functions for the normal distribution</p>
<table>
<thead>
<tr>
<th>Function</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>dnorm(x, mean, sd)</code></td>
<td>Evaluate the PDF at $x$ (mean = $\mu$ and sd = $\sqrt{\sigma^2}$)</td>
</tr>
<tr>
<td><code>pnorm(q, mean, sd)</code></td>
<td>Evaluate the CDF at $q$</td>
</tr>
<tr>
<td><code>qnorm(p, mean, sd)</code></td>
<td>Evaluate the quantile function at $p$</td>
</tr>
<tr>
<td><code>rnorm(n, mean, sd)</code></td>
<td>Generate $n$ pseudo-random samples from the normal distribution</td>
</tr>
</tbody>
</table>
<p>These four functions exist for each distribution where <code>d...</code> function evaluates the density/mass, <code>p...</code> evaluates the CDF, <code>q...</code> evaluates the quantile, and <code>r...</code> generates a sample. Table 2 lists the <code>d...</code> functions for some of the most popular distributions. The <code>d</code> can be replaced with <code>p</code>, <code>q</code>, or <code>r</code> for any of the distributions, depending on what you want to calculate.</p>
<p>For details enter <code>?dnorm</code> to view R's documentation page for the Normal distribution. As usual , replace the <code>norm</code> with any distribution to read the documentation for that distribution.</p>
<table>
<thead>
<tr>
<th>Distribution</th>
<th>Function</th>
<th>Parameters</th>
</tr>
</thead>
<tbody>
<tr>
<td>$Binomial(n,p)$</td>
<td><code>dbinom(x, size, prob)</code></td>
<td>size = $n$, prob = $p$</td>
</tr>
<tr>
<td>$Poisson(\lambda)$</td>
<td><code>dpois(x, lambda)</code></td>
<td>lambda = $\lambda$</td>
</tr>
<tr>
<td>$Exp(\lambda)$</td>
<td><code>dexp(x, rate)</code></td>
<td>rate = $\lambda$</td>
</tr>
<tr>
<td>$Gamma(\alpha, \beta)$</td>
<td><code>dgamma(x, shape, rate)</code></td>
<td>shape = $\alpha$, rate = $\beta$</td>
</tr>
<tr>
<td>$Uniform(a, b)$</td>
<td><code>dunif(x, min, max)</code></td>
<td>min = $a$, max = $b$</td>
</tr>
<tr>
<td>$Beta(\alpha, \beta)$</td>
<td><code>dbeta(x, shape1, shape2)</code></td>
<td>shape1 = $\alpha$, shape2 = $\beta$</td>
</tr>
<tr>
<td>$N(\mu, \sigma^2)$</td>
<td><code>dnorm(x, mean, sd)</code></td>
<td>mean = $\mu$, sd = $\sqrt{\sigma^2}$</td>
</tr>
<tr>
<td>$t_v$</td>
<td><code>dt(x, df)</code></td>
<td>df = $v$</td>
</tr>
</tbody>
</table>
<h2>Two Coin Example</h2>
<p>Suppose your brother has a coin which you know to be loaded so that it comes up heads 70% of the time. He then comes to you with some coin, you're not sure which one and he wants to make a bet with you. Betting money that it's going to come up heads.</p>
<p>You're not sure if it's the loaded coin or if it's just a fair one. So he gives you a chance to flip it 5 times to check it out.</p>
<p>You flip it five times and get 2 heads and 3 tails. Which coin do you think it is and how sure are you about that?</p>
<p>We'll start by defining the unknown parameter $\theta$, this is either that the coin is fair or it's a loaded coin.
$$
\theta = {fair ,loaded}
$$</p>
<p>$$
X \sim Bin(5, ?)
$$</p>
<p>$$
f(x|\theta) = \begin{cases}
{5 \choose x}(\frac{1}{2})^5 &amp; \theta = fair \
{5 \choose x} (.7)^x (.3)^{5 - x} &amp; \theta = loaded\
\end{cases}
$$</p>
<p>We can also rewrite $f(x|\theta)$ with indicator functions
$$
f(x|\theta) = {5\choose x}(.5)^5I<em>{{\theta= fair}} + {5 \choose x}(.7)^x(.3)^{5 - x}I</em>{{\theta = loaded}}
$$
In this case, we observed that $x = 2$
$$
f(\theta | x = 2) = \begin{cases}
0.3125 &amp; \theta = fair \
0.1323 &amp; \theta = loaded
\end{cases}
$$
MLE $\hat{\theta} = fair$ </p>
<p>That's a good point estimate, but then how do we answer the question, how sure are you?</p>
<p>This is not a question that's easily answered in the frequentest paradigm. Another question is that we might like to know what is the probability that theta equals fair, give, we observe two heads.
$$
P(\theta = fair|x = 2) = ?
$$
In the frequentest paradigm, the coin is a physical quantity. It's a fixed coin, and therefore it has a fixed probability of coining up heads. It is either the fair coin, or it's the loaded coin.
$$
P(\theta = fair) = {0,1}
$$</p>
<h3>Bayesian Approach to the Problem</h3>
<p>An advantage of the Bayesian approach is that it allows you to easily incorporate prior information, when you know something in advance of the looking at the data. This is difficult to do under the Frequentest paradigm.</p>
<p>In this case, we're talking about your brother. You probably know him pretty well. So suppose you think that before you've looked at the coin, there's a 60% probability that this is the loaded coin.</p>
<p>This case, we put this into our prior. Our prior is that the probability the coin is loaded is 0.6. We can update our prior with the data to get our posterior beliefs, and we can do this using the bayes theorem.</p>
<p>Prior: $P(loaded) = 0.6$
$$
f(\theta|x) = \frac{f(x|\theta)f(\theta)}{\sum_\theta{f(x|\theta)f(\theta)}}
$$</p>
<p>$$
f(\theta|x) = \frac{{5\choose x} [(\frac{1}{2})^5(.4)I<em>{{\theta = fair}} + (.7)^x (.3)^{5-x}(.6)I</em>{{\theta = loaded}} ] }
{{5\choose x} [(\frac{1}{2})^5(.4) + (.7)^x (.3)^{5-x}(0.6) ] }
$$</p>
<p>$$
f(\theta|x=2)= \frac{0.0125I<em>{{\theta=fair}} + 0.0079I</em>{{\theta=loaded}} }{0.0125+0.0079}
$$</p>
<p>$$
f(\theta|x=2) = 0.612I<em>{{\theta=fair}} + 0.388I</em>{{\theta = loaded}}
$$</p>
<p>As you can see in the calculation here, we have the likelihood times the prior in the numerator, and in the denominator, we have a normalizing constant, so that when we divide by this, we'll get answer that add up to one. These numbers match exactly in this case, because it's a very simple problem. But this is a concept that goes on, what's in the denominator here is always a normalizing constant.
$$
P(\theta = loaded | x = 2) = 0.388
$$
This here updates our beliefs after seeing some data about what the probability might be.</p>
<p>We can also examine what would happen under different choices of prior.
$$
P(\theta = loaded) = \frac{1}{2} \implies P(\theta = loaded | x = 2) = 0.297
$$</p>
<p>$$
P(\theta = loaded) = 0.9 \implies P(\theta = loaded | x = 2) = 0.792
$$</p>
<p>In this case, the Bayesian approach is inherently subjective. It represents your own personal perspective, and this is an important part of the paradigm. If you have a different perspective, you will get different answers, and that's okay. It's all done in a mathematically vigorous framework, and it's all mathematically consistent and coherent.</p>
<p>And in the end, we get results that are interpretable</p>
<h2>Continuous Bayes</h2>
<p>$$
f(\theta | y) = \frac{f(y | \theta)f(\theta)}{f(y)} = \frac{f(y|\theta)f(\theta)}{\int{f(y|\theta)f(\theta)d\theta}} = \frac{likelihood <em> prior}{normalization} \propto likelihood </em> prior
$$</p>
<p>In practice, sometimes this integral can be a pain to compute. And so, we may work with looking at saying this is proportional to the likelihood times the prior. And if we can figure out what this looks like and just put the appropriate normalizing constant on at the end, we don't necessarily have to compute this integral.</p>
<p>So for example, suppose we're looking at a coin and it has unknown probability $\theta$ of coming up heads. Suppose we express ignorance about the value of $\theta$ by assigning it a uniform distribution.
$$
\theta \sim U[0, 1]
$$</p>
<p>$$
f(\theta) = I_{{0 \le \theta\le 1}}
$$</p>
<p>$$
f(\theta | y = 1) = \frac{\theta^1(1-\theta)^0I<em>{{0 \le \theta\le1}}}{\int</em>{-\infty}^\infty{\theta^1(1-\theta)^0I_{{0\le \theta \le 1}}}}
$$</p>
<p>$$
f(\theta | y = 1) = \frac{\theta I_{{0\le\theta\le1}}}{\int<em>0^1{\theta d\theta}} = 2\theta I</em>{{0\le\theta\le1}}
$$</p>
<p>Now if we didn't want to take the integral we could've done this approach
$$
f(\theta | y) \propto f(y|\theta)f(\theta) \propto \theta I_{{0\le\theta\le1}}
$$
Which then we need to find the constant such that it's a proper PMF. In this case, it's $2$.</p>
<p>Since it's a proper PMF, we can perform interval probabilities as well. This is called Posterior interval estimates.
$$
P(0.025 &lt; \theta &lt; 0.975) = \int_{0.025}^{0.975}{2\theta d \theta} = (0.975)^2 - (0.025)^2 = 0.95
$$</p>
<p>$$
P(\theta &gt; 0.05) = 1 - (0.05)^2 = 0.9975
$$</p>
<p>These are the sort of intervals we would get from the prior and asking about what their posterior probability is.</p>
<p>In other cases, we may want to ask, what is the posterior interval of interest? What's an interval that contains 95% of posterior probability in some meaningful way? This would be equivalent then to a frequentest confidence interval. We can do this in several different ways, 2 main ways that we make Bayesian Posterior intervals or credible intervals are equal-tailed intervals and highest posterior density intervals.</p>
<h2>Equal-tailed Interval</h2>
<p>In the case of an equal-tailed interval, we put the equal amount of probability in each tail. So to make a 95% interval we'll put 0.025 in each tail. </p>
<p>To be able to do this, we're going to have to figure out what the quantiles are. So we're going to need some value, $q$, so that
$$
P(\theta &lt; q | Y = 1) = \int_0^9{2\theta d\theta} = q^2
$$</p>
<p>$$
P(\sqrt{0.025} &lt; \theta &lt; \sqrt{0.975}) = P(0.158 &lt; \theta &lt; 0.987) = 0.95
$$</p>
<p>This is an equal tailed interval in that the probability that $\theta$ is less than 0.18 is the same as the probability that $\theta$ is greater than 0.987. We can say that under the posterior, there's a 95% probability that $\theta$ is in this interval.</p>
<h2>Highest Posterior Density (HPD)</h2>
<p>Here we want to ask where in the density function is it highest? Theoretically this will be the shortest possible interval that contains the given probability, in this case a 95% probability.
$$
P(\theta &gt; \sqrt{0.05} | Y = 1) = P(\theta &gt; 0.224 | Y = 1) = 0.95
$$
This is the shortest possible interval, that under the posterior has a probability 0.95. it's $\theta$ going from 0.224 up to 1.</p>
<p>The posterior distribution describes our understanding of our uncertainty combinbing our prior beliefs and the data. It does this with a probability density function, so at the end of teh day, we can make intervals and talk about probabilities of data being in the interval. </p>
<p>This is different from the frequentest approach, where we get confidence intervals. But we can't say a whole lot about the actual parameter relative to the confidence interval. We can only make long run frequency statements about hypothetical intervals.</p>
<p>In this case, we can legitimately say that the posterior probability that $\theta$ is bigger than 0.05 is $0.9975$. We can also say that we believe there's a 95% probability that $\theta$ is in between 0.158 and 0.987.</p>
<p>Bayesians represent uncertainty with probabilities, so that the coin itself is a physical quantity. It may have a particular value for $\theta$.</p>
<p>It may be fixed, but because we don't know what that value is, we represent our uncertainty about that value with a distribution. And at the end of the day, we can represent our uncertainty, collect it with the data, and get a posterior distribution and make intuitive statements.</p>
<h4></h4>
<p>Frequentest confidence intervals have the interpretation that &quot;If you were to repeat many times the process of collecting data and computing a 95% confidence interval, then on average about 95% of those intervals would contain the true parameter value; however, once you observe data and compute an interval the true value is either in the interval or it is not, but you can't tell which.&quot; </p>
<p>Bayesian credible intervals have the interpretation that &quot;Your posterior probability that the parameter is in a 95% credible interval is 95%.&quot; </p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,360 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<p>How do we choose a prior?</p>
<p>Our prior needs to represent our personal perspective, beliefs, and our uncertainties.</p>
<p>Theoretically, we're defining a cumulative distribution function for the parameter
$$
\begin{cases}
P(\theta \le c) &amp; c \in \mathbb{R}
\end{cases}
$$
This is true for an infinite number of possible sets. This isn't practical to do, and it would be very difficult to do coherently so that all the probabilities were consistent.</p>
<p>In practice, we work with a convenient family that's sufficiently flexible such that a member of a family represents our beliefs.</p>
<p>Generally if one has enough data, the information in the data will overwhealm the information in the prior. And so, the prior is not particularly important in terms of what you get for the posterior. Any reasonable choice of prior will lead to approximately the same posterior. However, there are some things that can go wrong.</p>
<h2>Example of Bad Prior</h2>
<p>Suppose we chose a prior that says the probability of $P(\theta = \frac{1}{2}) = 1$ </p>
<p>And thus, the probability of $\theta$ equaling any other value is $0$. If we do this, our data won't make a difference since we only put a probability of $1$ at a single point.
$$
f(\theta|y) \propto f(y|\theta)f(\theta) = f(\theta) = \delta(\theta)
$$</p>
<p>In the basic context, events with prior probability of zero have a posterior probability of zero. Events with a prior probability of one, have a posterior probability of one.</p>
<p>Thus a good Bayesian will not assign probability of zero or one to any event that has already occurred or already known not to occur.</p>
<h2>Calibration</h2>
<p>A useful concept in terms of choosing priors is that of the calibration of predictive intervals. </p>
<p>If we make an interval where we're saying we predict 95% of new data points will occur in this interval. It would be good if in reality 95% of new data points actually did fall in that interval. </p>
<p>How do we calibrate to reality? This is actually a frequentest concept but this is important for practical statistical purposes that our results reflect reality.</p>
<p>We can compute a predictive interval, this is an interval such that 95% of new observations are expected to fall into it. It's an interval for the <strong>data</strong> rather than an interval for $\theta$
$$
f(y) = \int{f(y|\theta)f(\theta)d\theta} = \int{f(y, \theta)d\theta}
$$
Where $f(y,\theta)$ is the joint density of Y and $\theta$.</p>
<p>This is the prior predictive before any data is observed.</p>
<p><strong>Side Note:</strong> From this you can say that $f(y, \theta) = f(y|\theta)f(\theta)$</p>
<h2>Binomial Example</h2>
<p>Suppose we're going to flip a coin ten times and count the number of heads we see. We're thinking about this in advance of actually doing it, so we're interested in the predictive distribution. How many heads do we predict we're going to see?
$$
X = \sum_{i = 1}^{10}{Y_i}
$$
Where $Y_i$ is each individual coin flip.</p>
<p>If we think that all possible coins or all possible probabilities are equally likely, then we can put a prior for $\theta$ that's flat over the interval from 0 to 1.
$$
f(\theta) = I_{{0 \le \theta \le 1}}
$$</p>
<p>$$
f(x) = \int{f(x|\theta)f(\theta)d\theta} = \int_0^1{\frac{10!}{x!(10-x)!}\theta^x(1-\theta)^{10 -x}(1)d\theta}
$$</p>
<p>Note that because we're interested in $X$ at the end, it's important that we distinguish between a binomial density and a Bernoulli density. Here we just care about the total count rather than the exact ordering which would be Bernoulli's</p>
<p>For most of the analyses we're doing, where we're interested in $\theta$ rather than x, the binomial and the Bernoulli are interchangeable because the part in here that depends on $\theta$ is the same.</p>
<p>To solve this integral let us recall some facts
$$
n! = \Gamma(n + 1)
$$</p>
<p>$$
Z \sim Beta(\alpha, \beta)
$$</p>
<p>$$
f(z) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)}z^{\alpha - 1}(1-z)^{\beta - 1}
$$</p>
<p>Let us rewrite $f(x)$
$$
f(x) = \int_0^1{\frac{\Gamma(11)}{\Gamma(x + 1)\Gamma(11 - x)}\theta^{(x + 1)-1}(1-\theta)^{(11-x)-1}d\theta}
$$</p>
<p>$$
f(x) = \frac{\Gamma(11)}{\Gamma(12)}\int_0^1{\frac{\Gamma(12)}{\Gamma(x + 1)\Gamma(11 - x)}\theta^{(x + 1)-1}(1-\theta)^{(11-x)-1}d\theta}
$$</p>
<p>The integral above is a beta density, all integrals of valid beta densities equals to one.
$$
f(x) = \frac{\Gamma(11)}{\Gamma(12)} = \frac{10!}{11!} = \frac{1}{11}
$$
For $x \in {0, 1, 2, \dots, 10}$</p>
<p>Thus we see that if we start with a uniform prior, we then end up with a discrete uniform predictive density for $X$. If all possible probabilities are equally likely, then all possible $X$ outcomes are equally likely.</p>
<h2>Posterior Predictive Distribution</h2>
<p>What about after we've observed data? What's our posterior predictive distribution?</p>
<p>Going from the previous example, let us observe after one flip that we got a head. We want to ask, what's our predictive distribution for the second flip, given we saw a head on the first flip.
$$
f(y_2|y_1) = \int{f(y_2|\theta,y_1)f(\theta|y_1)}d\theta
$$
We're going to assume that $Y_2$ is independent of $Y_1$. Therefore,
$$
f(y_2 |y_1) = \int{f(y_2|\theta)f(\theta|y_1)d\theta}
$$
Suppose we're thinking of a uniform distribution for $\theta$ and we observe the first flip is a heads. What do we predict for the second flip?</p>
<p>This is no longer going to be a uniform distribution like it was before, because we have some data. We're going to think it's more likely that we're going to get a second head. We think this because since we observed a head $\theta$ is now likely to be at least $\frac{1}{2}$ possibly larger.
$$
f(y_2 | Y_1 = 1) = \int_0^1{\theta^{y_2}(1-\theta)^{1-y_2}2\theta d\theta}
$$</p>
<p>$$
f(y_2|Y_1 = 1) = \int_0^1{2\theta^{y_2 + 1}(1-\theta)^{1-y_2}d\theta}
$$</p>
<p>We could work this out in a more general form, but in this case, $Y_2$ has to take the value $0$ or $1$. The next flip is either going to be heads or tails so it's easier to just plop in a particular example.
$$
P(Y_2 = 1|Y_1 = 1) = \int_0^1{2\theta^2d\theta} = \frac{2}{3}
$$</p>
<p>$$
P(Y_2 = 0 | Y_1 = 1) = 1 - P(Y_2 = 1 | Y_1 = 1) = 1 - \frac{2}{3} = \frac{1}{3}
$$</p>
<p>We can see here that the posterior is a combination of the information in the prior and the information in the data. In this case, our prior is like having two data points, one head and one tail. </p>
<p>Saying we have a uniform prior for $\theta$ is equivalent in an information sense as saying we have observed one head and one tail.</p>
<p>So then when we observe one head, it's like we now have seen two heads and one tail. So our predictive distribution for the second flip says if we have two heads and one tail, then we have a $\frac{2}{3}$ probability of getting another head and a $\frac{1}{3}$ probability of getting another tail.</p>
<h2>Binomial Likelihood with Uniform Prior</h2>
<p>Likelihood of y given theta is
$$
f(y|\theta) = \theta^{\sum{y_i}}(1-\theta)^{n - \sum{y_i}}
$$</p>
<p>Our prior for theta is just a uniform distribution
$$
f(\theta) = I_{{0 \le \theta \le 1}}
$$
Thus our posterior for $\theta$ is
$$
f(\theta | y) = \frac{f(y|\theta)f(\theta)}{\int{f(y|\theta)f(\theta)d\theta}} = \frac{\theta^{\sum{y_i}}(1-\theta)^{n - \sum{y<em>i}} I</em>{{0 \le \theta \le 1}}}{\int_0^1{\theta^{\sum{y_i}}(1-\theta)^{n - \sum{y<em>i}} I</em>{{0 \le \theta \le 1}}d\theta}}
$$
Recalling the form of the beta distribution we can rewrite our posterior as
$$
f(\theta | y) = \frac{\theta^{\sum{y_i}}(1-\theta)^{n - \sum{y<em>i}} I</em>{{0 \le \theta \le 1}}}{\frac{\Gamma(\sum{y_i} + 1)\Gamma(n - \sum{y_i} + 1)}{\Gamma(n + 2)}\int_0^1{\frac{\Gamma(n + 2)}{\Gamma(\sum{y_i} + 1)\Gamma(n - \sum{y_i} + 1)}\theta^{\sum{y_i}}(1-\theta)^{n - \sum{y_i}}d\theta}}
$$
Since the beta density integrates to $1$, we can simplify this as
$$
f(\theta | y) = \frac{\Gamma(n + 2)}{\Gamma(\sum{y_i}+ 1)\Gamma(n - \sum{y_i}+ 1)}\theta^{\sum{y_i}}(1-\theta)^{n-\sum{y<em>i}}I</em>{{0 \le \theta \le 1}}
$$
From here we can see that the posterior follows a beta distribution
$$
\theta | y \sim Beta(\sum{y_i} + 1, n - \sum{y_i} + 1)
$$</p>
<h2>Conjugate Priors</h2>
<p>The uniform distribution is $Beta(1, 1)$ </p>
<p>Any beta distribution is conjugate for the Bernoulli distribution. Any beta prior will give a beta posterior.
$$
f(\theta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha - 1}(1-\theta)^{\beta -1}I_{{0 \le \theta \le 1}}
$$</p>
<p>$$
f(\theta | y) \propto f(y|\theta)f(\theta) = \theta^{\sum{y_i}}(1-\theta)^{n - \sum{y<em>i}}\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha - 1}(1 - \theta)^{\beta - 1}I</em>{{0 \le \theta \le 1}}
$$</p>
<p>$$
f(y|\theta)f(\theta) \propto \theta^{\alpha + \sum{y_i}-1}(1-\theta)^{\beta + n - \sum{y_i} - 1}
$$</p>
<p>Thus we see that this is a beta distribution
$$
\theta | y \sim Beta(\alpha + \sum{y_i}, \beta + n - \sum{y_i})
$$
When $\alpha$ and $\beta$ is one like in the uniform distribution, then we get the same result as earlier.</p>
<p>This whole process where we choose a particular form of prior that works with a likelihood is called using a conjugate family.</p>
<p>A family of distributions is referred to as conjugate if when you use a member of that family as a prior, you get another member of that family as your posterior.</p>
<p>The beta distribution is conjugate for the Bernoulli distribution. It's also conjugate for the binomial distribution. The only difference in the binomial likelihood is that there is a combinatoric term. Since that does not depend on $\theta$, we get the same posterior.</p>
<p>We often use conjugate priors because they make life much more simpler, sticking to conjugate families allows us to get closed form solutions easily.</p>
<p>If the family is flexible enough, then you can find a member of that family that closely represents your beliefs.</p>
<h2>Posterior Mean and Effect Size</h2>
<p>Returning to the beta posterior model it is clear how both the prior and the data contribute to the posterior.</p>
<p>We can say that the effect size of the prior is $\alpha + \beta$</p>
<p>Recall that the expected value or mean of a beta distribution is $\frac{\alpha}{\alpha + \beta}$</p>
<p>Therefore we can derive the posterior mean as
$$
posterior_{mean} = \frac{\alpha + \sum{y_i}}{\alpha + \sum{y_i}+\beta + n - \sum{y_i}}= \frac{\alpha+\sum{y<em>i}}{\alpha + \beta + n}
$$
We can further decompose this as
$$
posterior</em>{mean} = \frac{\alpha + \beta}{\alpha + \beta + n}\frac{\alpha}{\alpha + \beta} + \frac{n}{\alpha + \beta + n}\frac{\sum{y_i}}{n}
$$
We can describe this as the (prior weight <em> prior mean) + (data weight </em> data mean)</p>
<p>The posterior mean is a weighted average of the prior mean and the data mean.</p>
<p>This effective sample size gives you an idea of how much data you would need to make sure that your prior doesn't have much influence on your posterior.</p>
<p>If $\alpha + \beta$ is small compared to $n$ then the posterior will largely just be driven by the data. If $\alpha + \beta$ is large relative to $n$ then the posterior will be largely driven by the prior.</p>
<p>We can make a 95% credible interval using our posterior distribution for $\theta$. We can find an interval that actually has 95% probability of containing $\theta$.</p>
<p>Using Bayesian Statistics we can chain together dong a sequential update every time we get new data. We can get a new posterior, and we just use our previous posterior as a prior to do another update using Baye's theorem.</p>
<h2>Data Analysis Example in R</h2>
<p>Suppose we're giving two students a multiple-choice exam with 40 questions, where each question has four choices. We don't know how much the students have studied for this exam, but we think that they'll do better than just guessing randomly</p>
<p>1) What are the parameters of interest?</p>
<p>The parameters of interests are $\theta_1 = true$ the probability that the first student will answer a question correctly, $\theta_2 = true$ the probability that the second student will answer a question correctly.</p>
<p>2) What is our likelihood?</p>
<p>The likelihood is $Binomial(40, \theta)$, if we assume that each question is independent and that the probability a student gets each question right is the same for all questions for that student.</p>
<p>3) What prior should we use?</p>
<p>The conjugate prior is a beta prior. We can plot the density with <code>dbeta</code></p>
<pre><code class="language-R">theta = seq(from = 0, to = 1, by = 0.1)
# Uniform
plot(theta, dbeta(theta, 1, 1), type = 'l')
# Prior mean 2/3
plot(theta, dbeta(theta, 4, 2), type = 'l')
# Prior mean 2/3 but higher effect size (more concentrated at mean)
plot(theta, dbeta(theta, 8, 4), type = 'l')</code></pre>
<p>4 ) What are the prior probabilities $P(\theta &gt; 0.25)$? $P(\theta &gt; 0.5)$? $P(\theta &gt; 0.8)$?</p>
<pre><code class="language-R">1 - pbeta(0.25, 8, 4)
#[1] 0.998117
1 - pbeta(0.5, 8, 4)
#[1] 0.8867188
1 - pbeta(0.8, 8, 4)
#[1] 0.16113392</code></pre>
<p>5) Suppose the first student gets 33 questions right. What is the posterior distribution for $\theta_1$? $P(\theta &gt; 0.25)$? $P(\theta &gt; 0.5)$? $P(\theta &gt; 0.8)$? What is the 95% posterior credible interval for $\theta_1$?
$$
Posterior \sim Beta(8 + 33, 4 + 40 - 33) = Beta(41, 11)
$$
With a posterior mean of $\frac{41}{41+11} = \frac{41}{52}$</p>
<p>We can plot the posterior distribution with the prior</p>
<pre><code class="language-R">plot(theta, dbeta(theta, 41, 11), type = 'l')
lines(theta, dbeta(theta, 8 ,4), lty = 2) #Dashed line for prior</code></pre>
<p>Posterior probabilities</p>
<pre><code class="language-R">1 - pbeta(0.25, 41, 11)
#[1] 1
1 - pbeta(0.5, 41, 11)
#[1] 0.9999926
1 - pbeta(0.8, 41, 11)
#[1] 0.4444044</code></pre>
<p>Equal tailed 95% credible interval</p>
<pre><code class="language-R">qbeta(0.025, 41, 11)
#[1] 0.6688426
qbeta(0.975, 41, 11)
#[1] 0.8871094</code></pre>
<p>95% confidence that $\theta_1$ is between 0.67 and 0.89</p>
<p>6) Suppose the second student gets 24 questions right. What is the posterior distribution for $\theta_2$? $P(\theta &gt; 0.25)$? $P(\theta &gt; 0.5)$? $P(\theta &gt; 0.8)$? What is the 95% posterior credible interval for $\theta_2$
$$
Posterior \sim Beta(8 + 24, 4 + 40 - 24) = Beta(32, 20)
$$
With a posterior mean of $\frac{32}{32+20} = \frac{32}{52}$</p>
<p>We can plot the posterior distribution with the prior</p>
<pre><code class="language-R">plot(theta, dbeta(theta, 32, 20), type = 'l')
lines(theta, dbeta(theta, 8 ,4), lty = 2) #Dashed line for prior</code></pre>
<p>Posterior probabilities</p>
<pre><code class="language-R">1 - pbeta(0.25, 32, 20)
#[1] 1
1 - pbeta(0.5, 32, 20)
#[1] 0.9540427
1 - pbeta(0.8, 32, 20)
#[1] 0.00124819</code></pre>
<p>Equal tailed 95% credible interval</p>
<pre><code class="language-R">qbeta(0.025, 32, 20)
#[1] 0.4808022
qbeta(0.975, 32, 20)
#[1] 0.7415564</code></pre>
<p>95% confidence that $\theta_1$ is between 0.48 and 0.74</p>
<p>7) What is the posterior probability that $\theta_1 &gt; \theta_2$? i.e., that the first student has a better chance of getting a question right than the second student?</p>
<p>Estimate by simulation: draw 1,000 samples from each and see how often we observe $\theta_1 &gt; \theta_2$</p>
<pre><code class="language-R">theta1 = rbeta(100000, 41, 11)
theta2 = rbeta(100000, 32, 20)
mean(theta1 &gt; theta2)
#[1] 0.975</code></pre>
<h2>Poisson Data (Chocolate Chip Cookie Example)</h2>
<p>In mass produced chocolate chip cookies, they make a large amount of dough. They mix in a large number of chips, mix it up really well and then chunk out individual cookies. In this process, the number of chips per cookie approximately follow a Poisson distribution.</p>
<p>If we were to assume that chips have no volume, then this would be exactly a Poisson process and follow exactly a Poisson distribution. In practice, however, chips aren't that big so they follow approximately a Poisson distribution for the number of chips per cookie.
$$
Y_i \sim Poisson(\lambda)
$$</p>
<p>$$
f(y|\lambda) = \frac{\lambda^{\sum{y<em>i}}e^{-n\lambda}}{\prod</em>{i = 1}^n{y_i!}}
$$</p>
<p>This is for $\lambda &gt; 0$</p>
<p>What type of prior should we put on $\lambda$? It would be convenient if we could put a conjugate prior. What distribution looks like lambda raised to a power and e raised to a negative power?</p>
<p>For this, we're going to use a Gamma prior.
$$
\lambda \sim \Gamma(\alpha, \beta)
$$</p>
<p>$$
f(\lambda) = \frac{\beta^\alpha}{\Gamma(\alpha)}\lambda^{\alpha - 1}e^{-\beta\lambda}
$$</p>
<p>$$
f(\lambda | y) \propto f(y|\lambda)f(\lambda) \propto \lambda^{\sum{y_i}}e^{-n\lambda}\lambda^{\alpha - 1}e^{-\beta \lambda}
$$</p>
<p>$$
f(\lambda | y) \propto \lambda^{\alpha + \sum{y_i} - 1}e^{-(\beta + n)\lambda}
$$</p>
<p>Thus we can see that the posterior is a Gamma Distribution
$$
\lambda|y \sim \Gamma(\alpha + \sum{y_i}, \beta + n)
$$
The mean of Gamma under this parameterization is $\frac{\alpha}{\beta}$</p>
<p>The posterior mean is going to be
$$
posterior_{mean} = \frac{\alpha + \sum{y_i}}{\beta + n} = \frac{\beta}{\beta + n}\frac{\alpha}{\beta} + \frac{n}{\beta + n}\frac{\sum{y_i}}{n}
$$
As you can see here the posterior mean of the Gamma distribution is also the weighted average of the prior mean and the data mean.</p>
<p>Let us present two strategies on how to choose our hyper parameters $\alpha$ and $\beta$</p>
<ol>
<li>Think about the prior mean. For example, what do you think the number of chips per cookie on average is?</li>
</ol>
<p>After this, we need some other piece of knowledge to pin point both parameters. Here are some options.</p>
<ul>
<li>What is your error on the number of chips per cookie? In other words, what do you think the standard deviation. Under the Gamma prior the standard deviation is $\frac{\sqrt{\alpha}}{\beta}$</li>
<li>What is the effective sample size $\beta$? How many units of information do you think we have in our prior versus our data points.</li>
</ul>
<ol start="2">
<li>In Bayesian Statistics, a vague prior refers to one that's relatively flat across much of the space. For a Gamma prior we can choose $\Gamma(\epsilon, \epsilon)$ where $\epsilon$ is small and strictly positive.</li>
</ol>
<p>This would create a distribution with a mean of 1 and a huge standard deviation across the whole space. Hence the posterior will be largely driven by the data and very little by the prior.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,495 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h2>Exponential Data</h2>
<p>Suppose you're waiting for a bus that you think comes on average once every 10 minutes, but you're not sure exactly how often it comes.
$$
Y \sim Exp(\lambda)
$$
Your waiting time has a prior expectation of $\frac{1}{\lambda}$</p>
<p>It turns out the gamma distribution is conjugate for an exponential likelihood. We need to specify a prior, or a particular gamma in this case. If we think that the buses come on average every ten minutes, that's a rate of one over ten.
$$
prior_{mean} = \frac{1}{10}
$$
Thus, we'll want to specify a gamma distribution so that the first parameter divded by the second parameter is $\frac{1}{10}$</p>
<p>We can now think about our variability. Perhaps you specify
$$
\Gamma(100, 1000)
$$
This will indeed have a prior mean of $\frac{1}{10}$ and it'll have a standard deviation of $\frac{1}{100}$. If you want to have a rough estimate of our mean plus or minus two standard deviations then we have the following
$$
0.1 \pm 0.02
$$
Suppose that we wait for 12 minutes and a bus arrives. Now you want to update your posterior for $\lambda$ about how often this bus will arrive.
$$
f(\lambda | y) \propto f(y|\lambda)f(\lambda)
$$</p>
<p>$$
f(\lambda | y) \propto \lambda e^{-\lambda y}\lambda^{\alpha - 1}e^{-\beta \lambda}
$$</p>
<p>$$
f(\lambda | y) \propto \lambda^{(\alpha + 1) - 1}e^{-(\beta + y)\lambda}
$$</p>
<p>$$
\lambda | y \sim \Gamma(\alpha + 1, \beta + y)
$$</p>
<p>Plugging in our particular prior gives us a posterior for $\lambda$ which is
$$
\lambda | y \sim \Gamma(101, 1012)
$$
Thus our posterior mean is going to be $\frac{101}{1012} Which is equal to 0.0998.</p>
<p>This one observation doesn't contain a lot of data under this likelihood. When the bus comes and it takes 12 minutes instead of 10, it barely shifts our posterior mean up. One data point doesn't have a big impact here.</p>
<h2>Normal/Gaussian Data</h2>
<p>Let's suppose the standard deviation or variance is known and we're only interested in learning about the mean. This is the situation that often arises in monitoring industrial production processes.
$$
X_i \sim N(\mu, \sigma^2)
$$
It turns out that the Normal distribution is conjugate for itself when looking for the mean parameter</p>
<p>Prior
$$
\mu \sim N(m_0,S_0^2)
$$</p>
<p>$$
f(\mu |x ) \propto f(x|\mu)f(\mu)
$$</p>
<p>$$
\mu | x \sim N(\frac{n\bar{x}/\sigma_0^2 + m_0/s_0^2}{n/\sigma_0^2 + 1/s_0^2}, \frac{1}{n/\sigma_0^2 + 1/s_0^2})
$$</p>
<p>Let's look at the posterior mean
$$
posterior_{mean} = \frac{n/\sigma_0^2}{n/\sigma_0^2 + 1/s_0^2}\bar{x} + \frac{1/s_0^2}{n/\sigma_0^2 + 1/s_0^2}
$$</p>
<p>$$
posterior_{mean} = \frac{n}{n + \sigma_0^2/s_0^2}\bar{x} + \frac{\sigma_0^2/s_0^2}{n + \sigma_0^2/s_0^2}
$$</p>
<p>Thus we see, that the posterior mean is a weighted average of the prior mean and the data mean. And indeed that the effective sample size for this prior is the ratio of the variance for the data to the variance in the prior.</p>
<p>This makes sense, because the larger the variance of the prior, the less information that's in it.</p>
<p>The marginal distribution for Y is
$$
N(m_0, s_0^2 + \sigma^2)
$$</p>
<h3>When $\mu$ and $\sigma^2$ is unknown</h3>
<p>$$
X_i | \mu, \sigma^2 \sim N(\mu, \sigma^2)
$$</p>
<p>A prior from $\mu$ conditional on the value for $\sigma^2$
$$
\mu | \sigma^2 \sim N(m, \frac{\sigma^2}{w})
$$
$w$ is going to be the ratio of $\sigma^2$ and some variance for the Normal distribution. This is the effective sample size of the prior.</p>
<p>Finally, the last step is to specify a prior for $\sigma^2$. The conjugate prior here is an inverse gamma distribution with parameters $\alpha$ and $\beta$.
$$
\sigma^2 \sim \Gamma^{-1}(\alpha, \beta)
$$
After many calculations... we get the posterior distribution
$$
\sigma^2 | x \sim \Gamma^{-1}(\alpha + \frac{n}{2}, \beta + \frac{1}{2}\sum_{i = 1}^n{(x-\bar{x}^2 + \frac{nw}{2(n+2)}(\bar{x} - m)^2)})
$$</p>
<p>$$
\mu | \sigma^2,x \sim N(\frac{n\bar{x}+wm}{n+w}, \frac{\sigma^2}{n + w})
$$</p>
<p>Where the posterior mean can be written as the weighted average of the prior mean and the data mean.
$$
\frac{n\bar{x}+wm}{n+w} = \frac{w}{n + w}m + \frac{n}{n + w}\bar{x}
$$
In some cases, we really only care about $\mu$. We want some inference on $\mu$ and we may want it such that it doesn't depend on $\sigma^2$. We can marginalize that $\sigma^2$ integrating it out. The posterior for $\mu$ marginally follows a $t$ distribution.
$$
\mu | x \sim t
$$
Similarly the posterior predictive distribution also is a $t$ distribution.</p>
<p>Finally, note that we can extend this in various directions, this can be extended to the multivariate normal case that requires matrix vector notations and can be extended in a hierarchial fashion if we want to specify priors for $m$, $w$ and $\beta$ </p>
<h2>Non Informative Priors</h2>
<p>We've seen examples of choosing priors that contain a significant amount of information. We've also seen some examples of choosing priors where we're attempting to not put too much information in to keep them vague.</p>
<p>Another approach is referred to as objective Bayesian statistics or inference where we explicitly try to minimize the amount of information that goes into the prior.</p>
<p>This is an attempt to have the data have maximum influence on the posterior</p>
<p>Let's go back to coin flipping
$$
Y_i \sim B(\theta)
$$
How do we minimize our prior information in $\theta$? One obvious intuitive approach is to say that all values of $\theta$ are equally likely. So we could have a piror for $\theta$ which follows a uniform distribution on the interval $[0, 1]$ </p>
<p>Saying all values of $\theta$ are equally likely <strong>seems</strong> like it would have no information in it.</p>
<p>Recall however, that a $Uniform(0, 1)$ is the same as $Beta(1, 1)$ </p>
<p>The effective sample size of a beta prior is the sum of its two parameters. So in this case, it has an effective sample size of 2. This is equivalent to data, with one head and one tail already in it.</p>
<p>So this is not a completely non informative prior.</p>
<p>We could think about a prior that has less information. For example $Beta(\frac{1}{2}, \frac{1}{2})$, this would have half as much information with an effective sample size of one.</p>
<p>We can take this even further. Think about something like $Beta(0.001, 0.001)$ This would have much less information, with the effective sample size fairly close to zero. In this case, the data would determine the posterior and there would be very little influence from the prior.</p>
<h3>Improper priors</h3>
<p>Can we go even further? We can think of the limiting case. Let's think of $Beta(0,0)$, what would that look like?
$$
f(\theta) \propto \theta^{-1}(1-\theta)^{-1}
$$
This is not a proper density. If you integrate this over $(0,1)$, you'll get an infinite integral, so it's not a true density in the sense of it not integrating to 1.</p>
<p>There's no way to normalize it, since it has an infinite integral. This is what we refer to as an improper prior.</p>
<p>It's improper in the sense that it doesn't have a proper density. But it's not necessarily imporper in the sense that we can't use it. If we collect data, we use this prior and as long as we observe one head and one tail, or <strong>at least one success and one failure</strong>. Then we can get a posterior
$$
f(\theta|y) \propto \theta^{y-1}(1-\theta)^{n-y-1} \sim Beta(y, n-y)
$$
With a posterior mean of $\frac{y}{n} =\hat{\theta}$, which you should recognize as the maximum likelihood estimate. So by using this improper prior, we get a posterior which gives us point estimates exactly the same as the frequentest approach.</p>
<p>But in this case, we can also think of having a full posterior. From this, we can make interval statements, probability statements, and we can actually find an interval and say that there's 95% probability that $\theta$ is in this interval. Which is not something you can do under the frequentest approach even though we may get the same exact interval.</p>
<h3>Statements about improper priors</h3>
<p>Improper priors are okay as long as the posterior itself is proper. There may be some mathematical things that need to be checked and you may need to have certain restrictions on the data. In this case, we needed to make sure that we observed at least one head and one tail to get a proper posterior.</p>
<p>But as long as the posterior is proper, we can go forwards and do Bayesian inference even with an improper prior.</p>
<p>The second point is that for many problems there does exist a prior, typically an improper prior that will lead to the same point estimates as you would get under the frequentest paradigm. So we can get very similar results, results that are fully dependent on the data, under the Bayesian approach.</p>
<p>But in this case, we can also have continue to have a posterior and make posterior interval estimates and talk about posterior probabilities of the parameter.</p>
<h3>Normal Case</h3>
<p>Another example is thinking about the normal case.
$$
Y_i \stackrel{iid}\sim N(\mu, \sigma^2)
$$
Let's start off by assuming that $\sigma^2$ is known and we'll just focus on the mean $\mu$.</p>
<p>We can think about a vague prior like before and say that
$$
\mu \sim N(0, 1000000^2)
$$
This would just spread things out across the real line. That would be a fairly non informative prior covering a lot of possibilities. We can then think about taking the limit, what happens if we let the variance go to $\infty$. In that case, we're basically spreading out this distribution across the entire real number line. We can say that the density is just a constant across the whole real line.
$$
f(\mu) \propto 1
$$
This is an improper prior because if you integrate the real line you get an infinite answer. However, if we go ahead and find the posterior
$$
f(\mu|y) \propto f(y|\mu)f(\mu) \propto exp(-\frac{1}{2\sigma^2}\sum{(y_i - \mu)^2})(1)
$$</p>
<p>$$
f(\mu | y) \propto exp(-\frac{1}{2\sigma^2/n}(\mu - \bar{y})^2)
$$</p>
<p>$$
\mu | y \sim N(\bar{y}, \frac{\sigma^2}{n})
$$</p>
<p>This should look just like the maximum likelihood estimate.</p>
<h3>Unknown Variance</h3>
<p>In the case that $\sigma^2$ is unknown, the standard non informative prior is
$$
f(\sigma^2) \propto \frac{1}{\sigma^2}
$$</p>
<p>$$
\sigma^2 \sim \Gamma^{-1}(0,0)
$$</p>
<p>This is an improper prior and it's uniform on the log scale of $\sigma^2$.</p>
<p>In this case, we'll end up with a posterior for $\sigma^2$
$$
\sigma^2|y \sim \Gamma^{-1}(\frac{n-1}{2}, \frac{1}{2}\sum{(y_i - \bar{y})^2})
$$
This should also look reminiscent of quantities we get as a frequentest. For example, the samples standard deviation</p>
<h2>Jeffreys Prior</h2>
<p>Choosing a uniform prior depends upon the particular parameterization. </p>
<p>Suppose I used a prior which is uniform on the log scale for $\sigma^2$
$$
f(\sigma^2) \propto \frac{1}{\sigma^2}
$$
Suppose somebody else decides, that they just want to put a uniform prior on $\sigma^2$ itself.
$$
f(\sigma^2) \propto 1
$$
These are both uniform on certain scales or certain parameterizations, but they are different priors. So when we compute the posteriors, they will be different as well.</p>
<p>The key thing is that uniform priors are not invariant with respect to transformation. Depending on how you parameterize the problem, you can get different answers by using a uniform prior</p>
<p>One attempt to round this out is to use Jeffrey's Prior</p>
<p>Jeffrey's Prior is defined as the following
$$
f(\theta) \propto \sqrt{\mathcal{I(\theta)}}
$$
Where $\mathcal{I}(\theta)$ is the fisher information of $\theta$. In most cases, this will be an improper prior.</p>
<h3>Normal Data</h3>
<p>For the example of Normal Data
$$
Y_i \sim N(\mu, \sigma^2)
$$</p>
<p>$$
f(\mu) \propto 1
$$</p>
<p>$$
f(\sigma^2) \propto \frac{1}{\sigma^2}
$$</p>
<p>Where $\mu$ is uniform and $\sigma^2$ is uniform on the log scale.</p>
<p>This prior will then be transformation invariant. We will end up putting the same information into the prior even if we use a different parameterization for the Normal.</p>
<h3>Binomial</h3>
<p>$$
Y_i \sim B(\theta)
$$</p>
<p>$$
f(\theta) \propto \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}} \sim Beta(\frac{1}{2},\frac{1}{2})
$$</p>
<p>This is a rare example of where the Jeffreys prior turns out to be a proper prior.</p>
<p>You'll note that this prior actually does have some information in it. It's equivalent to an effective sample size of one data point. However, this information will be the same, not depending on the parameterization we use.</p>
<p>In this case, we have $\theta$ as a probability, but another alternative which is sometimes used is when we model things on a logistics scale.</p>
<p>By using the Jeffreys prior, we'll maintain the exact same information.</p>
<h3>Closing information about priors</h3>
<p>Other possible approaches to objective Bayesian inference includes priors such as reference priors and maximum entropy priors.</p>
<p>A related concept to this is called empirical Bayesian analysis. The idea in empirical Baye's is that you use the data to help inform your prior; such as by using the mean of the data to set the mean of the prior distribution. This approach often leads to reasonable point estimates in your posterior. However, it's sort of cheating since you're using your data twice and as a result may lead to improper uncertainty estimates.</p>
<h2>Fisher Information</h2>
<p>The Fisher information (for one parameter) is defined as
$$
\mathcal{I}(\theta) = E[(\frac{d}{d\theta}log{(f(X|\theta))})^2]
$$
Where the expectation is taken with respect to $X$ which has PDF $f(x|\theta)$. This quantity is useful in obtaining estimators for $\theta$ with good properties, such as low variance. It is also the basis for Jeffreys prior.</p>
<p><strong>Example:</strong> Let $X | \theta \sim N(\theta, 1)$. Then we have
$$
f(x|\theta) = \frac{1}{\sqrt{2\pi}}exp[-\frac{1}{2}(x-\theta)^2]
$$</p>
<p>$$
\log{(f(x|\theta))} = -\frac{1}{2}\log{(2\pi)}-\frac{1}{2}(x-\theta)^2
$$</p>
<p>$$
(\frac{d}{d\theta}log{(f(x|\theta))})^2 = (x-\theta)^2
$$</p>
<p>and so $\mathcal{I}(\theta) = E[(X - \theta)^2] = Var(X) = 1$</p>
<h2>Linear Regression</h2>
<h3>Brief Review of Regression</h3>
<p>Recall that linear regression is a model for predicting a response or dependent variable ($Y$, also called an output) from one or more covariates or independent variables ($X$, also called explanatory variables, inputs, or features). For a given value of a single $x$, the expected value of $y$ is
$$
E[y] = \beta_0 + \beta_1x
$$
or we could say that $Y \sim N(\beta_0 + \beta_1x, \sigma^2)$. For data $(x_1, y_1), \dots , (x_n, y_n)$, the fitted values for the coefficients, $\hat{\beta_0}$ and $\hat{\beta<em>1}$ are those that minimize the sum of squared errors $\sum</em>{i = 1}^n{(y_i - \hat{y_i})^2}$, where the predicted values for the response are $\hat{y} = \hat{\beta_0} + \hat{\beta_1}x$. We can get these values from R. These fitted coefficients give the least-squares line for the data.</p>
<p>This model extends to multiple covariates, with one $\beta_j$ for each $k$ covariates
$$
E[y_i] = \beta_0 + \beta<em>1x</em>{i1} + \dots + \beta<em>kx</em>{ik}
$$
Optionally, we can represent the multivariate case using vector-matrix notation.</p>
<h3>Conjugate Modeling</h3>
<p>In the Bayesian framework, we treat the $\beta$ parameters as unknown, put a prior on them, and then find the posterior. We might treat $\sigma^2$ as fixed and known, or we might treat it as an unknown and also put a prior on it. Because the underlying assumption of a regression model is that the errors are independent and identically normally distributed with mean $0$ and variance $\sigma^2$, this defines a normal likelihood.</p>
<h4>$\sigma^2$ known</h4>
<p>Sometimes we may know the value of the error variance $\sigma^2$. This simplifies calculations. The conjugate prior for the $\beta$'s is a normal prior. In practice, people typically use a non-informative prior, i.e., the limit as the variance of the normal prior goes to infinity, which has the same mean as the standard least-squares estimates. If we are only estimating $\beta$ and treating $\sigma^2$ as known, then the posterior for $\beta$ is a (multivariate) normal distribution. If we just have a single covariate, then the posterior for the slope is
$$
\beta<em>1 | y \sim N(\frac{\sum</em>{i = 1}^n{(x_i-\bar{x})(y<em>i - \bar{y})}}{\sum</em>{i=1}^n{(x<em>i-\bar{x})^2}}, \frac{\sigma^2}{\sum</em>{i=1}^n{(x_i - \bar{x})^2}})
$$
If we have multiple covariates, then using a matrix-vector notation, the posterior for the vector of coefficients is
$$
\beta | y \sim N((X^tX)^{-1}X^ty, (X^tX)^{-1}\sigma^2)
$$
where $X$ denotes the design matrix and $X^t$ is the transpose of $X$. The intercept is typically included in $X$ as a column of $1$'s. Using an improper prior requires us to have at least as many data points as we have parameters to ensure that the posterior is proper.</p>
<h4>$\sigma^2$ Unknown</h4>
<p>If we treat both $\beta$ and $\sigma^2$ as unknown, the standard prior is the non-informative Jeffreys prior, $f(\beta, \sigma^2) \propto \frac{1}{\sigma^2}$. Again, the posterior mean for $\beta$ will be the same as the standard least-squares estimates. The posterior for $\beta$ conditional on $\sigma^2$ is the same normal distributions as when $\sigma^2$ is known, but the marginal posterior distribution for $\beta$, with $\sigma^2$ integrated out is a $t$ distribution, analogous to the $t$ tests for significance in standard linear regression. The posterior $t$ distribution has mean $(X^tX)^{-1}X^ty$ and scale matrix (related to the variance matrix) $s^2(X^tX)^{-1}$, where $s^2 = \sum_{i = 1}^n{(y_i - \hat{y<em>i})^2/(n - k - 1)}$. The posterior distribution for $\sigma^2$ is an inverse gamma distribution
$$
\sigma^2 | y \sim \Gamma^{-1}(\frac{n - k - 1}{2}, \frac{n - k - 1}{2}s^2)
$$
In the simple linear regression case (single variable), the marginal posterior for $\beta$ is a $t$ distribution with mean $\frac{\sum</em>{i = 1}^n{(x_i-\bar{x})(y<em>i - \bar{y})}}{\sum</em>{i=1}^n{(x<em>i-\bar{x})^2}}$ and scale $ \frac{s^2}{\sum</em>{i=1}^n{(x_i - \bar{x})^2}}$. If we are trying to predict a new observation at a specified input $x^<em>$, that predicted value has a marginal posterior predictive distribution that is a $t$ distribution, with mean $\hat{y} = \hat{\beta_0} + \hat{\beta_1}x^</em>$ and scale $se_r\sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{(n - 1)s_x^2}}$. $se_r$ is the residual standard error of the regression, which can be found easily in R. $s_x^2$ is the sample variance of $x$. Recall that the predictive distribution for a new observation has more variability than the posterior distribution for $\hat{y}$ , because individual observations are more variable than the mean.</p>
<h2>Linear Regression</h2>
<h3>Single Variable Regression</h3>
<p>We'll be looking at the Challenger dataset. It contains 23 past launches where it has the temperature at the day of launch and the O-ring damage index</p>
<p><a href="http://www.randomservices.org/random/data/Challenger2.txt">http://www.randomservices.org/random/data/Challenger2.txt</a>
Read in the data</p>
<pre><code class="language-R">oring=read.table("http://www.randomservices.org/random/data/Challenger2.txt",
header=T)
# Note that attaching this masks T which is originally TRUE
attach(oring)</code></pre>
<p>Now we'll see the plot</p>
<pre><code class="language-R">plot(T, I)</code></pre>
<p><img src="files/courses/BayesianStatistics/Challenger.png" alt="Challenger" /></p>
<p>Fit a linear model</p>
<pre><code class="language-R">oring.lm=lm(I~T)
summary(oring.lm)</code></pre>
<p>Output of the summary</p>
<pre><code>Call:
lm(formula = I ~ T)
Residuals:
Min 1Q Median 3Q Max
-2.3025 -1.4507 -0.4928 0.7397 5.5337
Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) 18.36508 4.43859 4.138 0.000468
T -0.24337 0.06349 -3.833 0.000968
(Intercept) ***
T ***
---
Signif. codes:
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.102 on 21 degrees of freedom
Multiple R-squared: 0.4116, Adjusted R-squared: 0.3836
F-statistic: 14.69 on 1 and 21 DF, p-value: 0.0009677</code></pre>
<p>Add the fitted line into the scatterplot</p>
<pre><code class="language-R">lines(T,fitted(oring.lm)) </code></pre>
<p><img src="files/courses/BayesianStatistics/challengerfitted.png" alt="challengerfitted" /></p>
<p>Create a 95% posterior interval for the slope</p>
<pre><code class="language-R">-0.24337 - 0.06349*qt(.975,21)
# [1] -0.3754047
-0.24337 + 0.06349*qt(.975,21)
# [1] -0.1113353</code></pre>
<p><strong>Note:</strong> These are the same as the frequentest confidence intervals</p>
<p>If the challenger launch was at 31 degrees Fahrenheit, how much O-Ring damage would we predict?</p>
<pre><code class="language-R">coef(oring.lm)[1] + coef(oring.lm)[2]*31
# [1] 10.82052 </code></pre>
<p>Let's make our posterior prediction interval</p>
<pre><code class="language-R">predict(oring.lm,data.frame(T=31),interval="predict")</code></pre>
<p>Output of <code>predict</code></p>
<pre><code> fit lwr upr
1 10.82052 4.048269 17.59276</code></pre>
<p>We can calculate the lower bound through the following formula</p>
<pre><code class="language-R">10.82052-2.102*qt(.975,21)*sqrt(1+1/23+((31-mean(T))^2/22/var(T)))</code></pre>
<p>What's the posterior probability that the damage index is greater than zero?</p>
<pre><code class="language-R">1-pt((0-10.82052)/(2.102*sqrt(1+1/23+((31-mean(T))^2/22/var(T)))),21)</code></pre>
<h3>Multivariate Regression</h3>
<p>We're looking at Galton's seminal data predicting the height of children from the height of the parents</p>
<p><a href="http://www.randomservices.org/random/data/Galton.txt">http://www.randomservices.org/random/data/Galton.txt</a>
Read in the data</p>
<pre><code class="language-R">heights=read.table("http://www.randomservices.org/random/data/Galton.txt",
header=T)
attach(heights)</code></pre>
<p>What are the columns in the dataset?</p>
<pre><code class="language-R">names(heights)
# [1] "Family" "Father" "Mother" "Gender" "Height" "Kids" </code></pre>
<p>Let's look at the relationship between the different variables</p>
<pre><code class="language-R">pairs(heights)</code></pre>
<p><img src="files/courses/BayesianStatistics/heightpairs.png" alt="heightpairs" /></p>
<p>First let's start by creating a linear model taking all of the columns into account</p>
<pre><code class="language-R">summary(lm(Height~Father+Mother+Gender+Kids))</code></pre>
<p>Output of <code>summary</code></p>
<pre><code>Call:
lm(formula = Height ~ Father + Mother + Gender + Kids)
Residuals:
Min 1Q Median 3Q Max
-9.4748 -1.4500 0.0889 1.4716 9.1656
Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) 16.18771 2.79387 5.794 9.52e-09
Father 0.39831 0.02957 13.472 &lt; 2e-16
Mother 0.32096 0.03126 10.269 &lt; 2e-16
GenderM 5.20995 0.14422 36.125 &lt; 2e-16
Kids -0.04382 0.02718 -1.612 0.107
(Intercept) ***
Father ***
Mother ***
GenderM ***
Kids
---
Signif. codes:
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.152 on 893 degrees of freedom
Multiple R-squared: 0.6407, Adjusted R-squared: 0.6391
F-statistic: 398.1 on 4 and 893 DF, p-value: &lt; 2.2e-16</code></pre>
<p>As you can see here, the <code>Kids</code> column is not significant. Let's look at a model with it removed.</p>
<pre><code class="language-R">summary(lm(Height~Father+Mother+Gender))</code></pre>
<p>Output of <code>summary</code></p>
<pre><code>Call:
lm(formula = Height ~ Father + Mother + Gender)
Residuals:
Min 1Q Median 3Q Max
-9.523 -1.440 0.117 1.473 9.114
Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) 15.34476 2.74696 5.586 3.08e-08
Father 0.40598 0.02921 13.900 &lt; 2e-16
Mother 0.32150 0.03128 10.277 &lt; 2e-16
GenderM 5.22595 0.14401 36.289 &lt; 2e-16
(Intercept) ***
Father ***
Mother ***
GenderM ***
---
Signif. codes:
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.154 on 894 degrees of freedom
Multiple R-squared: 0.6397, Adjusted R-squared: 0.6385
F-statistic: 529 on 3 and 894 DF, p-value: &lt; 2.2e-16</code></pre>
<p>This model looks good, let's go ahead and save it to a variable</p>
<pre><code class="language-R">heights.lm=lm(Height~Father+Mother+Gender)</code></pre>
<p>From this we can tell that for each extra inch of height in a father is correlated with an extra 0.4 inches extra to the height of a child.</p>
<p>We can also tell that each extra inch of height in a mother is correlated with an extra 0.3 inches extra to the height of the child.</p>
<p>A male child is on average 5.2 inches taller than a female child.</p>
<p>Let's create a 95% posterior interval for the difference in height by gender</p>
<pre><code class="language-R">5.226 - 0.144*qt(.975,894)
# [1] 4.943383
5.226 + 0.144*qt(.975,894)
# [1] 5.508617</code></pre>
<p>Let's make a posterior prediction interval for a male and female with a father whose 68 inches and a mother whose 64 inches.</p>
<pre><code class="language-R">predict(heights.lm,data.frame(Father=68,Mother=64,Gender="M"),
interval="predict")
# fit lwr upr
# 1 68.75291 64.51971 72.9861</code></pre>
<pre><code class="language-R">predict(heights.lm,data.frame(Father=68,Mother=64,Gender="F"),
interval="predict")
# fit lwr upr
# 1 63.52695 59.29329 67.76062</code></pre>
<h2>What's next?</h2>
<p>This concludes this course, if you want to go further with Bayesian statistics, the next topics to explore would be hierarchal modeling and fitting of non conjugate models with Markov Chain Monte Carlo or MCMC.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,89 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Bayesian Statistics: From Concept to Data Analysis</h1>
<p>In the Winter of 2017, I took a course on Bayesian Statistics on Coursera offered by Dr. Herbert Lee.</p>
<p>Below are the notes for each of the four weeks.</p>
<p><a href="index.html%3Fcourses%252FBayesianStatistics%252Fweek1.html">Week 1</a></p>
<p><a href="index.html%3Fcourses%252FBayesianStatistics%252Fweek2.html">Week 2</a></p>
<p><a href="index.html%3Fcourses%252FBayesianStatistics%252Fweek3.html">Week 3</a></p>
<p><a href="index.html%3Fcourses%252FBayesianStatistics%252Fweek4.html">Week 4</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,337 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Reproducible Research Week 1</h1>
<h2>Replication</h2>
<p>The ultimate standard for strengthening scientific evidence is replication of finding and conducting studies with independent</p>
<ul>
<li>Investigators</li>
<li>Data</li>
<li>Analytical Methods</li>
<li>Laboratories</li>
<li>Instruments</li>
</ul>
<p>Replication is particularly important in studies that can impact broad policy or regulatory decisions</p>
<h3>What's wrong with replication?</h3>
<p>Some studies cannot be replicated</p>
<ul>
<li>No time, opportunistic</li>
<li>No money</li>
<li>Unique</li>
</ul>
<p><em>Reproducible Research:</em> Make analytic data and code available so that others may reproduce findings</p>
<p>Reproducibility bridges the gap between replication which is awesome and doing nothing.</p>
<h2>Why do we need reproducible research?</h2>
<p>New technologies increasing data collection throughput; data are more complex and extremely high dimensional</p>
<p>Existing databases can be merged into new &quot;megadatabases&quot;</p>
<p>Computing power is greatly increased, allowing more sophisticated analyses</p>
<p>For every field &quot;X&quot; there is a field &quot;Computational X&quot;</p>
<h2>Research Pipeline</h2>
<p>Measured Data -&gt; Analytic Data -&gt; Computational Results -&gt; Figures/Tables/Numeric Summaries -&gt; Articles -&gt; Text</p>
<p>Data/Metadata used to develop test should be made publically available</p>
<p>The computer code and fully specified computational procedures used for development of the candidate omics-based test should be made sustainably available</p>
<p>&quot;Ideally, the computer code that is released will encompass all of the steps of computational analysis, including all data preprocessing steps. All aspects of the analysis needs to be transparently reported&quot; -- IOM Report</p>
<h3>What do we need for reproducible research?</h3>
<ul>
<li>Analytic data are available</li>
<li>Analytic code are available</li>
<li>Documentation of code and data</li>
<li>Standard means of distribution</li>
</ul>
<h3>Who is the audience for reproducible research?</h3>
<p>Authors:</p>
<ul>
<li>Want to make their research reproducible</li>
<li>Want tools for reproducible research to make their lives easier (or at least not much harder)</li>
</ul>
<p>Readers:</p>
<ul>
<li>Want to reproduce (and perhaps expand upon) interesting findings</li>
<li>Want tools for reproducible research to make their lives easier.</li>
</ul>
<h3>Challenges for reproducible research</h3>
<ul>
<li>Authors must undertake considerable effort to put data/results on the web (may not have resources like a web server)</li>
<li>Readers must download data/results individually and piece together which data go with which code sections, etc.</li>
<li>Readers may not have the same resources as authors</li>
<li>Few tools to help authors/readers</li>
</ul>
<h3>What happens in reality</h3>
<p>Authors:</p>
<ul>
<li>Just put stuff on the web</li>
<li>(Infamous for disorganization) Journal supplementary materials</li>
<li>There are some central databases for various fields (e.g biology, ICPSR)</li>
</ul>
<p>Readers:</p>
<ul>
<li>Just download the data and (try to) figure it out</li>
<li>Piece together the software and run it</li>
</ul>
<h2>Literate (Statistical) Programming</h2>
<p>An article is a stream of text and code</p>
<p>Analysis code is divided into text and code &quot;chunks&quot;</p>
<p>Each code chunk loads data and computes results</p>
<p>Presentation code formats results (tables, figures, etc.)</p>
<p>Article text explains what is going on</p>
<p>Literate programs can be weaved to produce human-readable documents and tagled to produce machine-readable documents</p>
<p>Literate programming is a general concept that requires</p>
<ol>
<li>A documentation language (human readable)</li>
<li>A programming language (machine readable)</li>
</ol>
<p>Knitr is an R package that brings a variety of documentation languages such as Latex, Markdown, and HTML</p>
<h3>Quick summary so far</h3>
<p>Reproducible research is important as a minimum standard, particularly for studies that are difficult to replicate</p>
<p>Infrastructure is needed for creating and distributing reproducible document, beyond what is currently available</p>
<p>There is a growing number of tools for creating reproducible documents</p>
<p><strong>Golden Rule of Reproducibility: Script Everything</strong></p>
<h2>Steps in a Data Analysis</h2>
<ol>
<li>Define the question</li>
<li>Define the ideal data set</li>
<li>Determine what data you can access</li>
<li>Obtain the data</li>
<li>Clean the data</li>
<li>Exploratory data analysis</li>
<li>Statistical prediction/modeling</li>
<li>Interpret results</li>
<li>Challenge results</li>
<li>Synthesize/write up results</li>
<li>Create reproducible code</li>
</ol>
<p>&quot;Ask yourselves, what problem have you solved, ever, that was worth solving, where you knew all of the given information in advance? Where you didn't have a surplus of information and have to filter it out, or you had insufficient information and have to go find some?&quot; -- Dan Myer</p>
<p>Defining a question is the kind of most powerful dimension reduction tool you can ever employ.</p>
<h3>An Example for #1</h3>
<p><strong>Start with a general question</strong></p>
<p>Can I automatically detect emails that are SPAM or not?</p>
<p><strong>Make it concrete</strong></p>
<p>Can I use quantitative characteristics of emails to classify them as SPAM?</p>
<h3>Define the ideal data set</h3>
<p>The data set may depend on your goal</p>
<ul>
<li>Descriptive goal -- a whole population</li>
<li>Exploratory goal -- a random sample with many variables measured</li>
<li>Inferential goal -- The right population, randomly sampled</li>
<li>Predictive goal -- a training and test data set from the same population</li>
<li>Causal goal -- data from a randomized study</li>
<li>Mechanistic goal -- data about all components of the system</li>
</ul>
<h3>Determine what data you can access</h3>
<p>Sometimes you can find data free on the web</p>
<p>Other times you may need to buy the data</p>
<p>Be sure to respect the terms of use</p>
<p>If the data don't exist, you may need to generate it yourself.</p>
<h3>Obtain the data</h3>
<p>Try to obtain the raw data</p>
<p>Be sure to reference the source</p>
<p>Polite emails go a long way</p>
<p>If you load the data from an Internet source, record the URL and time accessed</p>
<h3>Clean the data</h3>
<p>Raw data often needs to be processed</p>
<p>If it is pre-processed, make sure you understand how</p>
<p>Understand the source of the data (census, sample, convenience sample, etc)</p>
<p>May need reformatting, subsampling -- record these steps</p>
<p><strong>Determine if the data are good enough</strong> -- If not, quit or change data</p>
<h3>Exploratory Data Analysis</h3>
<p>Look at summaries of the data</p>
<p>Check for missing data</p>
<p>-&gt; Why is there missing data?</p>
<p>Look for outliers</p>
<p>Create exploratory plots</p>
<p>Perform exploratory analyses such as clustering</p>
<p>If it's hard to see your plots since it's all bunched up, consider taking the log base 10 of an axis</p>
<p><code>plot(log10(trainSpan$capitalAve + 1) ~ trainSpam$type)</code></p>
<h3>Statistical prediction/modeling</h3>
<p>Should be informed by the results of your exploratory analysis</p>
<p>Exact methods depend on the question of interest</p>
<p>Transformations/processing should be accounted for when necessary</p>
<p>Measures of uncertainty should be reported.</p>
<h3>Interpret Results</h3>
<p>Use the appropriate language</p>
<ul>
<li>Describes</li>
<li>Correlates with/associated with</li>
<li>Leads to/Causes</li>
<li>Predicts</li>
</ul>
<p>Gives an explanation</p>
<p>Interpret Coefficients</p>
<p>Interpret measures of uncertainty</p>
<h3>Challenge Results</h3>
<p>Challenge all steps:</p>
<ul>
<li>Question</li>
<li>Data Source</li>
<li>Processing</li>
<li>Analysis</li>
<li>Conclusions</li>
</ul>
<p>Challenge measures of uncertainty</p>
<p>Challenge choices of terms to include in models</p>
<p>Think of potential alternative analyses</p>
<h3>Synthesize/Write-up Results</h3>
<p>Lead with the question</p>
<p>Summarize the analyses into the story</p>
<p>Don't include every analysis, include it</p>
<ul>
<li>If it is needed for the story</li>
<li>If it is needed to address a challenge</li>
<li>Order analyses according to the story, rather than chronologically</li>
<li>Include &quot;pretty&quot; figures that contribute to the story</li>
</ul>
<h3>In the lecture example...</h3>
<p>Lead with the question</p>
<p> Can I use quantitative characteristics of the emails to classify them as SPAM?</p>
<p>Describe the approach</p>
<p> Collected data from UCI -&gt; created training/test sets</p>
<p> Explored Relationships</p>
<p> Choose logistic model on training set by cross validation</p>
<p> Applied to test, 78% test set accuracy</p>
<p>Interpret results</p>
<p> Number of dollar signs seem reasonable, e.g. &quot;Make more money with Viagra $ $ $ $&quot;</p>
<p>Challenge Results</p>
<p> 78% isn't that great</p>
<p> Could use more variables</p>
<p> Why use logistic regression?</p>
<h2>Data Analysis Files</h2>
<p>Data</p>
<ul>
<li>Raw Data</li>
<li>Processed Data</li>
</ul>
<p>Figures</p>
<ul>
<li>Exploratory Figures</li>
<li>Final Figures</li>
</ul>
<p>R Code</p>
<ul>
<li>Raw/Unused Scripts</li>
<li>Final Scripts</li>
<li>R Markdown Files</li>
</ul>
<p>Text</p>
<ul>
<li>README files</li>
<li>Text of Analysis/Report</li>
</ul>
<h3>Raw Data</h3>
<p>Should be stored in the analysis folder</p>
<p>If accessed from the web, include URL, description, and date accessed in README</p>
<h3>Processed Data</h3>
<p>Processed data should be named so it is easy to see which script generated the data</p>
<p>The processing script -- processed data mapping should occur in the README</p>
<p>Processed data should be tidy</p>
<h3>Exploratory Figures</h3>
<p>Figures made during the course of your analysis, not necessarily part of your final report</p>
<p>They do not need to be &quot;pretty&quot;</p>
<h3>Final Figures</h3>
<p>Usually a small subset of the original figures</p>
<p>Axes/Colors set to make the figure clear</p>
<p>Possibly multiple panels</p>
<h3>Raw Scripts</h3>
<p>May be less commented (but comments help you!)</p>
<p>May be multiple versions</p>
<p>May include analyses that are later discarded</p>
<h3>Final Scripts</h3>
<p>Clearly commented</p>
<ul>
<li>
<p>Small comments liberally - what, when, why, how</p>
</li>
<li>Bigger commented blocks for whole sections</li>
</ul>
<p>Include processing details</p>
<p>Only analyses that appear in the final write-up</p>
<h3>R Markdown Files</h3>
<p>R Markdown files can be used to generate reproducible reports</p>
<p>Text and R code are integrated</p>
<p>Very easy to create in RStudio</p>
<h3>Readme Files</h3>
<p>Not necessary if you use R Markdown</p>
<p>Should contain step-by-step instructions for analysis</p>
<h3>Text of the document</h3>
<p>It should contain a title, introduction (motivation), methods (statistics you used), results (including measures of uncertainty), and conclusions (including potential problems)</p>
<p>It should tell a story</p>
<p>It should not include every analysis you performed</p>
<p>References should be included for statistical methods</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,267 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h2>Coding Standards for R</h2>
<ol>
<li>Always use text files/text editor</li>
<li>Indent your code</li>
<li>Limit the width of your code (80 columns?)</li>
<li>Author suggests indentation of 4 spaces at minimum</li>
<li>Limit the length of individual functions</li>
</ol>
<h2>What is Markdown?</h2>
<p>Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it structurally to valid XHTML/HTML</p>
<h2>Markdown Syntax</h2>
<p><code>*This text will appear italicized!*</code></p>
<p><em>This text will appear italicized!</em></p>
<p><code>**This text will appear bold!**</code></p>
<p><strong>This text will appear bold</strong></p>
<p><code>## This is a secondary heading</code></p>
<p><code>###This is a tertiary heading</code></p>
<h2>This is a secondary heading</h2>
<h3>This is a tertiary heading</h3>
<p>Unordered Lists</p>
<p><code>- first item in list</code></p>
<p><code>- second item in list</code></p>
<ul>
<li>first item in list</li>
<li>second item in list</li>
</ul>
<p>Ordered lists</p>
<p><code>1. first item in list</code></p>
<p><code>2. second item in list</code></p>
<p><code>3. third item in list</code></p>
<ol>
<li>first item in list</li>
<li>second item in list</li>
<li>third item in list</li>
</ol>
<p>Create links</p>
<p><code>[Download R](http://www.r-project.org/)</code></p>
<p><a href="http://www.r-project.org/">Download R</a></p>
<p>Advanced linking</p>
<p><code>I spent so much time reading [R bloggers][1] and [Simply Statistics][2]!</code></p>
<p><code>[1]: http://www.r-bloggers.com/ "R bloggers"</code></p>
<p><code>[2]: http://simplystatistics.org/ "Simply Statistics"</code></p>
<p>I spent so much time reading <a href="http://www.r-bloggers.com/" title="R bloggers">R bloggers</a> and <a href="http://simplystatistics.org/" title="Simply Statistics">Simply Statistics</a>!</p>
<p>Newlines require a double space after the end of a line</p>
<h2>What is Markdown?</h2>
<p>Created by John Gruber and Aaron Swartz. It is a simplified version of &quot;markup&quot; languages. It allows one to focus on writing as opposed to formatting. Markdown provides a simple, minimal, and intuitive way of formatting elements. </p>
<p>You can easily convert Markdown to valid HTML (and other formats) using existing tools.</p>
<h2>What is R Markdown?</h2>
<p>R Markdown is the integration of R code with markdown. It allows one to create documents containing &quot;live&quot; R code. R code is evaluated as part of the processing of the markdown and its results are inserted into the Markdown document. R Markdown is a core tool in <strong>literate statistical programming</strong></p>
<p>R Markdown can be converted to standard markdown using <code>knitr</code> package in R. Markdown can then be converted to HTML using the <code>markdown</code> package in R. This workflow can be easily managed using R Studio. One can create powerpoint like slides using the <code>slidify</code> package.</p>
<h2>Problems, Problems</h2>
<ul>
<li>Authors must undertake considerable effort to put data/results on the web</li>
<li>Readers must download data/results individually and piece together which data go with which code sections, etc.</li>
<li>Authors/readers must manually interact with websites</li>
<li>There is no single documents to integrate data analysis with textual representations; i.e data, code, and text are not linked</li>
</ul>
<p>One of the ways to resolve this is to simply put the data and code together in the same document so that people can execute the code in the right order, and the data are read at the right times. You can have a single document that integrates the data analysis with all the textual representations.</p>
<h2>Literate Statistical Programming</h2>
<ul>
<li>Original idea comes from Don Knuth</li>
<li>An article is a stream of <strong>text</strong> and <strong>code</strong></li>
<li>Analysis code is divded into text and code &quot;chunks&quot;</li>
<li>Presentation code formats results (tables, figures, etc.)</li>
<li>Article text explains what is going on</li>
<li>Literate programs are weaved to produce human-readable documents and tangled to produce machine-readable documents.</li>
</ul>
<h2>Literate Statistical Programming</h2>
<ul>
<li>Literate programming is a general concept. We need
<ul>
<li>A documentation language</li>
<li>A programming language</li>
</ul></li>
<li><code>knitr</code> supports a variety of documentation languages</li>
</ul>
<h2>How Do I Make My Work Reproducible?</h2>
<ul>
<li>Decide to do it (ideally from the start)</li>
<li>Keep track of everything, hopefully through a version control system</li>
<li>Use software in which operations can be coded</li>
<li>Don't save output</li>
<li>Save data in non-proprietary formats</li>
</ul>
<h2>Literate Programming: Pros</h2>
<ul>
<li>Text and code all in one place, logical order</li>
<li>Data, results automatically updated to reflect external changes</li>
<li>Code is live -- automatic &quot;regression test&quot; when building a document</li>
</ul>
<h2>Literate Programming: Cons</h2>
<ul>
<li>Text and code are all in one place; can make documents difficult to read, especially if there is a lot of code</li>
<li>Can substantially slow down processing of documents (although there are tools to help)</li>
</ul>
<h2>What is Knitr Good For?</h2>
<ul>
<li>Manuals</li>
<li>Short/Medium-Length technical documents</li>
<li>Tutorials</li>
<li>Reports (Especially if generated periodically)</li>
<li>Data Preprocessing documents/summaries</li>
</ul>
<h2>What is knitr NOT good for?</h2>
<ul>
<li>Very long research articles</li>
<li>Complex time-consuming computations</li>
<li>Documents that require precise formatting</li>
</ul>
<h2>Non-GUI Way of Creating R Markdown documents</h2>
<pre><code class="language-R">library(knitr)
setwd(&lt;working directory&gt;)
knit2html('document.Rmd')
browseURL('document.html')</code></pre>
<h2>A few notes about knitr</h2>
<ul>
<li>
<p>knitr will fill a new document with filler text; delete it</p>
</li>
<li>
<p>Code chunks begin with <code>```{r}</code> and ends with <code>```</code> </p>
</li>
<li>
<p>All R code goes in between these markers</p>
</li>
<li>
<p>Code chunks can have names, which is useful when we start making graphics</p>
<p><code>```{r firstchunk}</code></p>
<p><code>## R code goes here</code></p>
<p><code>```</code></p>
</li>
<li>By default, code in a code chunk is echoed, as will the results of the computation (if there are results to print)</li>
</ul>
<h2>Processing of knitr documents</h2>
<ul>
<li>You write RMarkdown document (.Rmd)</li>
<li>knitr produces a Markdown document (.md)</li>
<li>knitr converts the Markdown document into HTML (by default)</li>
<li>.Rmd -&gt; .md -&gt; .html</li>
<li>You should NOT edit (or save) the .md or .html documents until you are finished</li>
</ul>
<h2>Inline Text Computations</h2>
<p>You can reference variable in RMarkdown through the following</p>
<pre><code>`The current time is `r time`. My favorite random number is `r rand`</code></pre>
<h2>Setting Global Options</h2>
<ul>
<li>Sometimes we want to set options for every code chunk that are different from the defaults</li>
<li>For example, we may want to suppress all code echoing and results output</li>
<li>We have to write some code to set these global options</li>
</ul>
<p>Example for suppressing all code chunks</p>
<pre><code class="language-R">```{r setoptions, echo=FALSE}
opts_chunk$set(echo=False, results = "hide")
```</code></pre>
<h2>Some Common Options</h2>
<ul>
<li>Output
<ul>
<li>Results: &quot;axis&quot;, &quot;hide&quot;</li>
<li>echo: TRUE, FALSE</li>
</ul></li>
<li>Figures
<ul>
<li>fig.height: numeric</li>
<li>fig.width: numeric</li>
</ul></li>
</ul>
<h2>Caching Computations</h2>
<ul>
<li>What if one chunk takes a long time to run?</li>
<li>All chunks have to be re-computed every time you re-knit the file</li>
<li>The <code>cache=TRUE</code> option can be set on a chunk-by-chunk basis to store results of computation</li>
<li>After the first run, results are loaded from cache</li>
</ul>
<h2>Caching Caveats</h2>
<ul>
<li>If the data or code (or anything external) changes, you need to re-run the cache code chunks</li>
<li>Dependencies are not checked explicitly!!!!</li>
<li>Chunks with significant <em>side effects</em> may not be cacheable</li>
</ul>
<h2>Summary of knitr</h2>
<ul>
<li>Literate statistical programming can be a useful way to put text, code, data, output all in one document</li>
<li>knitr is a powerful tool for iterating code and text in a simple document format</li>
</ul>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,370 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h2>tl;dr</h2>
<p>People are busy, especially managers and leaders. Results of data analyses are sometimes presented in oral form, but often the first cut is presented via email.</p>
<p>It is often useful therefore, to breakdown the results of an analysis into different levels of granularity/detail</p>
<h2>Hierarchy of Information: Research Paper</h2>
<ul>
<li>Title / Author List
<ul>
<li>Speaks about what the paper is about</li>
<li>Hopefully interesting</li>
<li>No detail</li>
</ul></li>
<li>Abstract
<ul>
<li>Motivation of the problem</li>
<li>Bottom Line Results</li>
</ul></li>
<li>Body / Results
<ul>
<li>Methods</li>
<li>More detailed results</li>
<li>Sensitivity Analysis</li>
<li>Implication of Results</li>
</ul></li>
<li>Supplementary Materials / Gory Details
<ul>
<li>Details on what was done</li>
</ul></li>
<li>Code / Data / Really Gory Details
<ul>
<li>For reproducibility</li>
</ul></li>
</ul>
<h2>Hierarchy of Information: Email Presentation</h2>
<ul>
<li>Subject Line / Subject Info
<ul>
<li>At a minimum: include one</li>
<li>Can you summarize findings in one sentence?</li>
</ul></li>
<li>Email Body
<ul>
<li>A brief description of the problem / context: recall what was proposed and executed; summarize findings / results. (Total of 1-2 paragraphs)</li>
<li>If action is needed to be taken as a result of this presentation, suggest some options and make them as concrete as possible</li>
<li>If questions need to be addressed, try to make them yes / no</li>
</ul></li>
<li>Attachment(s)
<ul>
<li>R Markdown file</li>
<li>knitr report</li>
<li>Stay Concise: Don't spit out pages of code</li>
</ul></li>
<li>Links to Supplementary Materials
<ul>
<li>Code / Software / Data</li>
<li>Github Repository / Project Website</li>
</ul></li>
</ul>
<h2>DO: Start with Good Science</h2>
<ul>
<li>Remember: Garbage, in, garbage out</li>
<li>Find a coherent focused question. This helps solve many problems</li>
<li>Working with good collaborators reinforces good practices</li>
<li>Something that's interesting to you will hopefully motivate good habits</li>
</ul>
<h2>DON'T: Do Things By Hand</h2>
<ul>
<li>Editing spreadsheets of data to &quot;clean it up&quot;
<ul>
<li>Removing outliers</li>
<li>QA / QC</li>
<li>Validating</li>
</ul></li>
<li>Editing tables or figures (e.g rounding, formatting)</li>
<li>Downloading data from a website</li>
<li>Moving data around your computer, splitting, or reformatting files.</li>
</ul>
<p>Things done by hand need to precisely documented (this is harder than it sounds!)</p>
<h2>DON'T: Point and Click</h2>
<ul>
<li>Many data processing / statistical analysis packages have graphical user interfaces (GUIs)</li>
<li>GUIs are convenient / intuitive but the actions you take with a GUI can be difficult for others to reproduce</li>
<li>Some GUIs produce a log file or script which includes equivalent commands; these can be saved for later examination</li>
<li>In general, be careful with data analysis software that is highly interactive; ease of use can sometimes lead to non-reproducible analyses.</li>
<li>Other interactive software, such as text editors, are usually fine.</li>
</ul>
<h2>DO: Teach a Computer</h2>
<p>If something needs to be done as part of your analysis / investigation, try to teach your computer to do it (even if you only need to do it once) </p>
<p>In order to give your computer instructions, you need to write down exactly what you mean to do and how it should be done. Teaching a computer almost guarantees reproducibility</p>
<p>For example, by, hand you can</p>
<pre><code> 1. Go to the UCI Machine Learning Repository at http://archive.ics.uci.edu/mil/
2. Download the Bike Sharing Dataset</code></pre>
<p>Or you can teach your computer to do it using R</p>
<pre><code class="language-R">download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip", "ProjectData/Bike-Sharing-Dataset.zip")</code></pre>
<p>Notice here that:</p>
<ul>
<li>The full URL to the dataset file is specified</li>
<li>The name of the file saved to your local computer is specified</li>
<li>The directory to which the filed was saved is specified (&quot;ProjectData&quot;)</li>
<li>Code can always be executed in R (as long as link is available)</li>
</ul>
<h2>DO: Use Some Version Control</h2>
<p>It helps you slow things down by adding changes into small chunks. (Don't just do one massive commit). It allows one to track / tag snapshots so that one can revert back to older versions of the project. Software like Github / Bitbucket / SourceForge make it easy to publish results.</p>
<h2>DO: Keep Track of Your Software Environment</h2>
<p>If you work on a complex project involving many tools / datasets, the software and computing environment can be critical for reproducing your analysis.</p>
<p><strong>Computer Architecture</strong>: CPU (Intel, AMD, ARM), CPU Architecture, GPUs</p>
<p><strong>Operating System</strong>: Windows, Mac OS, Linux / Unix</p>
<p><strong>Software Toolchain</strong>: Compilers, interpreters, command shell, programming language (C, Perl, Python, etc.), database backends, data analysis software</p>
<p><strong>Supporting software / infrastructure</strong>: Libraries, R packages, dependencies</p>
<p><strong>External dependencies</strong>: Websites, data repositories, remote databases, software repositories</p>
<p><strong>Version Numbers:</strong> Ideally, for everything (if available)</p>
<p>This function in R helps report a bunch of information relating to the software environment</p>
<pre><code class="language-R">sessionInfo()</code></pre>
<h2>DON'T: Save Output</h2>
<p>Avoid saving data analysis output (tables, figures, summaries, processed data, etc.), except perhaps temporarily for efficiency purposes.</p>
<p>If a stray output file cannot be easily connected with the means by which it was created, then it is not reproducible</p>
<p>Save the data + code that generated the output, rather than the output itself.</p>
<p>Intermediate files are okay as long as there is clear documentation of how they were created.</p>
<h2>DO: Set Your Seed</h2>
<p>Random number generators generate pseudo-random numbers based on an initial seed (usually a number or set of numbers)</p>
<p> In R, you can use the <code>set.seed()</code> function to set the seed and to specify the random number generator to use</p>
<p>Setting the seed allows for the stream of random numbers to be exactly reproducible</p>
<p>Whenever you generate random numbers for a non-trivial purpose, <strong>always set the seed</strong>.</p>
<h2>DO: Think About the Entire Pipeline</h2>
<ul>
<li>Data analysis is a lengthy process; it is not just tables / figures/ reports</li>
<li>Raw data -&gt; processed data -&gt; analysis -&gt; report</li>
<li>How you got the end is just as important as the end itself</li>
<li>The more of the data analysis pipeline you can make reproducible, the better for everyone</li>
</ul>
<h2>Summary: Checklist</h2>
<ul>
<li>Are we doing good science?
<ul>
<li>Is this interesting or worth doing?</li>
</ul></li>
<li>Was any part of this analysis done by hand?
<ul>
<li>If so, are those parts precisely documented?</li>
<li>Does the documentation match reality?</li>
</ul></li>
<li>Have we taught a computer to do as much as possible (i.e. coded)?</li>
<li>Are we using a version control system?</li>
<li>Have we documented our software environment?</li>
<li>Have we saved any output that we cannot reconstruct from original data + code?</li>
<li>How far back in the analysis pipeline can we go before our results are no longer (automatically reproducible)</li>
</ul>
<h2>Replication and Reproducibility</h2>
<p>Replication</p>
<ul>
<li>Focuses on the validity of the scientific claim</li>
<li>Is this claim true?</li>
<li>The ultimate standard for strengtening scientiffic evidence</li>
<li>New investigators, data, analytical methods, laboratories, instruments, etc.</li>
<li>Particularly important in studies that can impact broad policy or regulatory decisions.</li>
</ul>
<p>Reproducibility</p>
<ul>
<li>Focuses on the validity of the data analysis</li>
<li>Can we trust this analysis?</li>
<li>Arguably a minimum standard for any scientific study</li>
<li>New investigators, same data, same methods</li>
<li>Important when replication is impossible</li>
</ul>
<h2>Background and Underlying Trends</h2>
<ul>
<li>Some studies cannot be replicated: No time, no money, or just plain unique / opportunistic</li>
<li>Technology is increasing data collection throughput; data are more complex and high-dimensional</li>
<li>Existing databases can be merged to become bigger databases (but data are used off-label)</li>
<li>Computing power allows more sophisticated analyses, even on &quot;small&quot; data</li>
<li>For every field &quot;X&quot;, there is a &quot;Computational X&quot;</li>
</ul>
<h2>The Result?</h2>
<ul>
<li>Even basic analyses are difficult to describe</li>
<li>Heavy computational requirements are thrust upon people without adequate training in statistics and computing</li>
<li>Errors are more easily introduced into long analysis pipelines</li>
<li>Knowledge transfer is inhibited</li>
<li>Results are difficult to replicate or reproduce</li>
<li>Complicated analyses cannot be trusted</li>
</ul>
<h2>What Problem Does Reproducibility Solve?</h2>
<p>What we get:</p>
<ul>
<li>Transparency</li>
<li>Data Availability</li>
<li>Software / Methods of Availability</li>
<li>Improved Transfer of Knowledge</li>
</ul>
<p>What we do NOT get</p>
<ul>
<li>Validity / Correctness of the analysis</li>
</ul>
<p>An analysis can be reproducible and still be wrong</p>
<p>We want to know 'can we trust this analysis</p>
<p>Does requiring reproducibility deter bad analysis?</p>
<h2>Problems with Reproducibility</h2>
<p>The premise of reproducible research is that with data/code available, people can check each other and the whole system is self-correcting</p>
<ul>
<li>Addresses the most &quot;downstream&quot; aspect of the research process -- Post-publication</li>
<li>Assumes everyone plays by the same rules and wants to achieve the same goals (i.e. scientific discovery)</li>
</ul>
<h2>Who Reproduces Research?</h2>
<ul>
<li>For reproducibility to be effective as a means to check validity, someone needs to do something
<ul>
<li>Re-run the analysis; check results match</li>
<li>Check the code for bugs/errors</li>
<li>Try alternate approaches; check sensitivity</li>
</ul></li>
<li>The need for someone to do something is inherited from traditional notion of replication</li>
<li>Who is &quot;someone&quot; and what are their goals?</li>
</ul>
<h2>The Story So Far</h2>
<ul>
<li>Reproducibility brings transparency (wrt code+data) and increased transfer of knowledge</li>
<li>A lot of discussion about how to get people to share data</li>
<li>Key question of &quot;can we trust this analysis&quot;? is not addressed by reproducibility</li>
<li>Reproducibility addresses potential problems long after they've occurred (&quot;downstream&quot;)</li>
<li>Secondary analyses are inevitably colored by the interests/motivations of others.</li>
</ul>
<h2>Evidence-based Data Analysis</h2>
<ul>
<li>Most data analyses involve stringing together many different tools and methods</li>
<li>Some methods may be standard for a given field, but others are often applied ad hoc</li>
<li>We should apply throughly studied (via statistical research), mutually agreed upon methods to analyze data whenever possible</li>
<li>There should be evidence to justify the application of a given method</li>
</ul>
<h2>Evidence-based Data Analysis</h2>
<ul>
<li>Create analytic pipelines from evidence-based components - standardize it</li>
<li>A deterministic statistical machine</li>
<li>Once an evidence-based analytic pipeline is established, we shouldn't mess with it</li>
<li>Analysis with a &quot;transparent box&quot;</li>
<li>Reduce the &quot;research degrees of freedom&quot;</li>
<li>Analogous to a pre-specified clinical trial protocol</li>
</ul>
<h2>Case Study: Estimating Acute Effects of Ambient Air Pollution Exposure</h2>
<ul>
<li>Acute / Short-term effects typically estimated via panel studies or time series studies</li>
<li>Work originated in late 1970s early 1980s</li>
<li>Key question &quot;Are short-term changes in pollution associated with short-term changes in a population health outcome?&quot;</li>
<li>Studies are usually conducted at a community level</li>
<li>Long history of statistical research investigating proper methods of analysis</li>
</ul>
<h2>Case Study: Estimating Acute Effects of Ambient Air Pollution Exposure</h2>
<ul>
<li>Can we encode everything that we have found in statistical / epidemiological research into a single package?</li>
<li>Time series studies do not have a huge range of variation; typically involves similar types of data and similar questions</li>
<li>We can create a deterministic statistical machine for this area?</li>
</ul>
<h2>DSM Modules for Time Series Studies of Air Pollution and Health</h2>
<ol>
<li>Check for outliers, high leverage, overdispersion</li>
<li>Fill in missing data? No!</li>
<li>Model selection: Estimate degrees of freedom to adjust for unmeasured confounders
<ul>
<li>Other aspects of model not as critical</li>
</ul></li>
<li>Multiple lag analysis</li>
<li>Sensitivity analysis wrt
<ul>
<li>Unmeasured confounder adjustment</li>
<li>Influential points</li>
</ul></li>
</ol>
<h2>Where to Go From Here?</h2>
<ul>
<li>One DSM is not enough, we need many!</li>
<li>Different problems warrant different approaches and expertise</li>
<li>A curated library of machines providing state-of-the-art analysis pipelines</li>
<li>A CRAN/CPAN/CTAN/... for data analysis</li>
<li>Or a &quot;Cochrane Collaboration&quot; for data analysis</li>
</ul>
<h2>A Curated Library of Data Analysis</h2>
<ul>
<li>Provide packages that encode data analysis pipelines for given problems, technologies, questions</li>
<li>Curated by experts knowledgeable in the field</li>
<li>Documentation / References given supporting module in the pipeline</li>
<li>Changes introduced after passing relevant benchmarks/unit tests</li>
</ul>
<h2>Summary</h2>
<ul>
<li>Reproducible research is important, but does not necessarily solve the critical question of whether a data analysis is trustworthy</li>
<li>Reproducible research focuses on the most &quot;downstream&quot; aspect of research documentation</li>
<li>Evidence-based data analysis would provide standardized best practices for given scientific areas and questions</li>
<li>Gives reviewers an important tool without dramatically increases the burden on them</li>
<li>More effort should be put into improving the quality of &quot;upstream&quot; aspects of scientific research</li>
</ul>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,233 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h2>The <code>cacher</code> Package for R</h2>
<ul>
<li>Add-on package for R</li>
<li>Evaluates code written in files and stores immediate results in a key-value database</li>
<li>R expressions are given SHA-1 hash values so that changes can be tracked and code reevaluated if necessary</li>
<li>&quot;Chacher packages&quot; can be built for distribution</li>
<li>Others can &quot;clone&quot; an analysis and evaluate subsets of code or inspect data objects</li>
</ul>
<p>The value of this is so other people can get the analysis or clone the analysis and look at subsets of the code. Or maybe more specifically data objects. People who want to run your code may not necessarily have the resources that you have. Because of that, they may not want to run the entire Markov chain Monte Carlo simulation that you did to get the posterior distribution or the histogram that you got at the end. </p>
<p>But the idea is that you peel the onion a little bit rather than just go straight to the core. </p>
<h2>Using <code>cacher</code> as an Author</h2>
<ol>
<li>Parse the R source file; Create the necessary cache directiories and subdirectories</li>
<li>Cycle through each expression in the source file
<ul>
<li>If an expression has never been evaluated, evaluate it and store any resulting R objects in the cache database</li>
<li>If any cached results exists, lazy-load the results from the cache database and move to the next expression</li>
<li>If an expression does not create any R objects (i.e, there is nothing to cache), add the expression to the list of expressions where evaluation needs to be forced</li>
<li>Write out metadata for this expression to the metadata file</li>
</ul></li>
</ol>
<ul>
<li>The <code>cachepackage</code> function creates a <code>cacher</code> package storing
<ul>
<li>Source File</li>
<li>Cached data objects</li>
<li>Metadata</li>
</ul></li>
<li>Package file is zipped and can be distributed</li>
<li>Readers can unzip the file and immediately investigate its contents via <code>cacher</code> package</li>
</ul>
<h2>Using <code>cacher</code> as a Reader</h2>
<p>A journal article says</p>
<p> &quot;... the code and data for this analysis can be found in the cacher package 092dcc7dda4b93e42f23e038a60e1d44dbec7b3f&quot;</p>
<pre><code class="language-R">library(cacher)
clonecache(id = "092dcc7dda4b93e42f23e038a60e1d44dbec7b3f")
clonecache(id = "092d") ## Same as above
# Created cache directory `.cache`
showfiles()
# [1] "top20.R"
sourcefile("top20.R")</code></pre>
<h2>Cloning an Analysis</h2>
<ul>
<li>Local directories are created</li>
<li>Source code files and metadata are downloaded</li>
<li>Data objects are <em>not</em> downloaded by default (may be really large)</li>
<li>References to data objects are loaded and corresponding data can be lazy-loaded on demand</li>
</ul>
<p><code>graphcode()</code> gives a node graph representing the code</p>
<h2>Running Code</h2>
<ul>
<li>The <code>runcode</code> function executes code in the source file</li>
<li>By default, expressions that results in an object being created are not run and the resulting objects is lazy-loaded into the workspace</li>
<li>Expressions not resulting in objects are evaluated</li>
</ul>
<h2>Checking Code and Objects</h2>
<ul>
<li>The <code>checkcode</code> function evaluates all expressions from scratch (no lazy-loading)</li>
<li>Results of evaluation are checked against stored results to see if the results are the same as what the author calculated
<ul>
<li>Setting RNG seeds is critical for this to work</li>
</ul></li>
<li>The integrity of data objects can be verified with the <code>checkobjects</code> function to check for possible corruption of data perhaps during transit.</li>
</ul>
<p>You can inspect data objects with <code>loadcache</code>. This loads in pointers to each of the data objects into the workspace. Once you access the object, it will transfer it from the cache.</p>
<h2><code>cacher</code> Summary</h2>
<ul>
<li>The <code>cacher</code> package can be used by authors to create cache packages from data analyses for distribution</li>
<li>Readers can use the <code>cacher</code> package to inspect others' data analyses by examing cached computations</li>
<li><code>cacher</code> is mindful of readers' resources and efficiently loads only those data objects that are needed.</li>
</ul>
<h1>Case Study: Air Pollution</h1>
<p>Particulate Matter -- PM</p>
<p>When doing air pollution studies you're looking at particulate matter pollution. The dust is not just one monolithic piece of dirt or soot but it's actually composed of many different chemical constituents. </p>
<p>Metals inert things like salts and other kinds of components so there's a possibility that a subset of those constituents are really harmful elements.</p>
<p>PM is composed of many different chemical constituents and it's important to understand that the Environmental Protection Agency (EPA) monitors the chemical constituents of particulate matter and has been doing so since 1999 or 2000 on a national basis.</p>
<h2>What causes PM to be Toxic?</h2>
<ul>
<li>PM is composed of many different chemical elements</li>
<li>Some components of PM may be more harmful than others</li>
<li>Some sources of PM may be more dangerous than others</li>
<li>Identifying harmful chemical constituents may lead us to strategies for controlling sources of PM</li>
</ul>
<h2>NMMAPS</h2>
<ul>
<li>
<p>The National Morbidity, Mortality, and Air Pollution Study (NMMAPS) was a national study of the short-term health effects of ambient air pollution</p>
</li>
<li>
<p>Focused primarily on particulate matter ($PM_{10}$) and Ozone ($O_3$) </p>
</li>
<li>
<p>Health outcomes include mortality from all causes and hospitalizations for cardiovascular and respiratory diseases</p>
</li>
<li>
<p>Key publications</p>
<ul>
<li><a href="http://www.ncbi.nlm.nih.gov/pubmed/11098531">http://www.ncbi.nlm.nih.gov/pubmed/11098531</a></li>
<li><a href="http://www.ncbi.nlm.nih.gov/pubmed/11354823">http://www.ncbi.nlm.nih.gov/pubmed/11354823</a></li>
</ul>
</li>
<li>
<p>Funded by the Heath Effects Institute</p>
<ul>
<li>Roger Peng currently serves on the Health Effects Institute Health Review Committee</li>
</ul>
<p></p>
</li>
</ul>
<h2>NMMAPS and Reproducibility</h2>
<ul>
<li>Data made available at the Internet-based Health and Air Pollution Surveillance System (<a href="http://www.ihapss.jhsph.edu">http://www.ihapss.jhsph.edu</a>)</li>
<li>Research and software also available at iHAPSS</li>
<li>Many studies (over 67 published) have been conducted based on the public data <a href="http://www.ncbi.nlm.nih.gov/pubmed/22475833">http://www.ncbi.nlm.nih.gov/pubmed/22475833</a></li>
<li>Has served as an important test bed for methodological development</li>
</ul>
<h2>What Causes Particulate Matter to be Toxic?</h2>
<p><a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1665439">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1665439</a></p>
<ul>
<li>Lippmann et al. found strong evidence that NI modified the short-term effect of $PM_{10}$ across 60 US communities</li>
<li>No other PM chemical constituent seemed to have the same modifying effect</li>
<li>To simple to be true?</li>
</ul>
<h2>A Reanalysis of the Lippmann et al. Study</h2>
<p><a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137127">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137127</a></p>
<ul>
<li>Rexamine the data from NMMAPS and link with PM chemical constituent data</li>
<li>Are the findings sensitive for levels of Nickel in New York City?</li>
</ul>
<h2>Does Nickel Make PM Toxic?</h2>
<ul>
<li>Long-term average nickel concentrations appear correlated with PM risk</li>
<li>There appear to be some outliers on the right-hand side (New York City)</li>
</ul>
<h2>Does Nickel Make PM Toxic?</h2>
<p>One of the most important things about those three points to the right is those are called high leverage points. So the regression line can be very senstive to high leverage points. Removing those three points from the dataset brings the regression line's slope down a little bit. Which then produces a line that is no longer statistical significant (p-value about 0.31)</p>
<h2>What Have We Learned?</h2>
<ul>
<li>New York does have very high levels of nickel and vanadium, much higher than any other US community</li>
<li>There is evidence of a positive relationship between NI concentrations and $PM_{10}$ risk</li>
<li>The strength of this relationship is highly sensitive to the observations from New York City</li>
<li>Most of the information in the data is derived from just 3 observations</li>
</ul>
<h2>Lessons Learned</h2>
<ul>
<li>Reproducibility of NMMAPS allowed for a secondary analysis (and linking with PM chemical constituent data) investigating a novel hypothesis (Lippmann et al.)</li>
<li>Reproducibility also allowed for a critique of that new analysis and some additional new analysis (Dominici et al.)</li>
<li>Original hypothesis not necessarily invalidated, but evidence not as strong as originally suggested (more work should be done)</li>
<li>Reproducibility allows for the scientific discussion to occur in a timely and informed manner</li>
<li>This is how science works.</li>
</ul>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,89 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Reproducible Research</h1>
<p>In the Winter of 2017, I took a coursera course on Reproducible Research taught by Dr. Peng from John Hopkins University</p>
<p>Below are my notes for each of the four weeks.</p>
<p><a href="index.html%3Fcourses%252FReproducibleResearch%252Fweek1.html">Week 1</a></p>
<p><a href="index.html%3Fcourses%252FReproducibleResearch%252Fweek2.html">Week 2</a></p>
<p><a href="index.html%3Fcourses%252FReproducibleResearch%252Fweek3.html">Week 3</a></p>
<p><a href="index.html%3Fcourses%252FReproducibleResearch%252Fweek4.html">Week 4</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,89 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Probability and Statistical Inference</h1>
<p>In the Fall of 2017, I took the course STAT 381 with Dr. Debra Hydorn. Below I included the interesting labs we worked on in the class.</p>
<p><em>Please note that these reports were not formatted for this site. So equations and images may not show up.</em></p>
<p><a href="index.html%3Fcourses%252Fstat381%252Frandomwalk.html">Random Walk</a></p>
<p><a href="index.html%3Fcourses%252Fstat381%252Frandomnumber.html">Random Number Generation</a></p>
<p><a href="index.html%3Fcourses%252Fstat381%252Fcentrallimit.html">Central Limit Theorum</a></p>
<p><a href="index.html%3Fcourses%252Fstat381%252Fconfint.html">Confidence Interval</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,453 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Central Limit Theorem Lab</h1>
<p><strong>Brandon Rozek</strong></p>
<h2>Introduction</h2>
<p>The Central Limit Theorem tells us that if the sample size is large, then the distribution of sample means approach the Normal Distribution. For distributions that are more skewed, a larger sample size is needed, since that lowers the impact of extreme values on the sample mean.</p>
<h3>Skewness</h3>
<p>Skewness can be determined by the following formula
$$
Sk = E((\frac{X - \mu}{\sigma})^3) = \frac{E((X - \mu)^3)}{\sigma^3}
$$
Uniform distributions have a skewness of zero. Poisson distributions however have a skewness of $\lambda^{-\frac{1}{2}}$. </p>
<p>In this lab, we are interested in the sample size needed to obtain a distribution of sample means that is approximately normal.</p>
<h3>Shapiro-Wilk Test</h3>
<p>In this lab, we will test for normality using the Shapiro-Wilk test. The null hypothesis of this test is that the data is normally distributed. The alternative hypothesis is that the data is not normally distributed. This test is known to favor the alternative hypothesis for a large number of sample means. To circumvent this, we will test normality starting with a small sample size $n$ and steadily increase it until we obtain a distribution of sample means that has a p-value greater than 0.05 in the Shapiro-Wilk test.</p>
<p>This tells us that with a false positive rate of 5%, there is no evidence to suggest that the distribution of sample means don't follow the normal distribution.</p>
<p>We will use this test to look at the distribution of sample means of both the Uniform and Poisson distribution in this lab.</p>
<h3>Properties of the distribution of sample means</h3>
<p>The Uniform distribution has a mean of $0.5$ and a standard deviation of $\frac{1}{\sqrt{12n}}$ and the Poisson distribution has a mean of $\lambda$ and a standard deviation of $\sqrt{\frac{\lambda}{n}}$.</p>
<h2>Methods</h2>
<p>For the first part of the lab, we will sample means from a Uniform distribution and a Poisson distribution of $\lambda = 1$ both with a sample size $n = 5$.</p>
<p>Doing so shows us how the Uniform distribution is roughly symmetric while the Poisson distribution is highly skewed. This begs the question: what sample size $(n)$ do I need for the Poisson distribution to be approximately normal?</p>
<h3>Sampling the means</h3>
<p>The maximum number of mean observations that the Shapiro-Wilk test allows is 5000 observations. Therefore, we will obtain <code>n</code> observations separately from both the Uniform or Poisson distribution and calculate the mean from it. Repeating that process 5000 times.</p>
<p>The mean can be calculated from the following way
$$
Mean = \frac{\sum x_i}{n}
$$
Where $x_i$ is the observation obtained from the Uniform or Poisson distribution</p>
<h3>Iterating with the Shapiro-Wilk Test</h3>
<p>Having a sample size of a certain amount doesn't always guarantee that it will fail to reject the Shapiro-Wilk test. Therefore, it is useful to run the test multiple times so that we can create a 95th percentile of sample sizes that fails to reject the Shapiro-Wilk test.</p>
<p>The issue with this is that lower lambda values result in higher skewness's. Which is described by the skewness formula. If a distribution has a high degree of skewness, then it will take a larger sample size n to make the sample mean distribution approximately normal.</p>
<p>Finding large values of n result in longer computational time. Therefore, the code takes this into account by starting at a larger value of n and/or incrementing by a larger value of n each iteration. Incrementing by a larger value of n decreases the precision, though that is the compromise I'm willing to take in order to achieve faster results.</p>
<p>Finding just the first value of $n$ that generates the sample means that fails to reject the Shapiro-Wilk test doesn't tell us much in terms of the sample size needed for the distribution of sample means to be approximately normal. Instead, it is better to run this process many times, finding the values of n that satisfy this condition multiple times. That way we can look at the distribution of sample sizes required and return back the 95th percentile.</p>
<p>Returning the 95th percentile tells us that 95% of the time, it was the sample size returned or lower that first failed to reject the Shapiro-Wilk test. One must be careful, because it can be wrongly interpreted as the sample size needed to fail to reject the Shapiro-Wilk test 95% of the time. Using that logic requires additional mathematics outside the scope of this paper. Returning the 95th percentile of the first sample size that failed to reject the Shapiro-Wilk test, however, will give us a good enough estimate for a sample size needed.</p>
<h3>Plots</h3>
<p>Once a value for <code>n</code> is determined, we sample the means of the particular distribution (Uniform/Poisson) and create histograms and Q-Q plots for each of the parameters we're interested in. We're looking to verify that the histogram looks symmetric and that the points on the Q-Q Plot fit closely to the Q-Q Line with one end of the scattering of points on the opposite side of the line as the other.</p>
<h2>Results</h2>
<h3>Part I</h3>
<p>Sampling the mean of the uniform distribution with $n = 5$ results in a mean of $\bar{x} = 0.498$ and standard deviation of $sd = 0.1288$. The histogram and Q-Q Plot can be seen in Figure I and Figure II respectively. </p>
<p>$\bar{x}$ isn't far from the theoretical 0.5 and the standard deviation is also close to
$$
\frac{1}{\sqrt{12(5)}} \approx 0.129
$$
Looking at the histogram and Q-Q plot, it suggests that data is approximately normal. Therefore we can conclude that a sample size of 5 is sufficient for the sample mean distribution coming from the normal distribution to be approximately normal.</p>
<p>Sampling the mean of the Poisson distribution with $n = 5$ and $\lambda = 1$ results in a mean of $\bar{x} = 0.9918$ and a standard deviation of $sd = 0.443$. The histogram and Q-Q Plot can be seen in Figures III and IV respectively.</p>
<p>$\bar{x}$ is not too far from the theoretical $\lambda = 1$, the standard deviation is a bit farther from the theoretical
$$
\sqrt{\frac{\lambda}{n}} = \sqrt{\frac{1}{5}} = 0.447
$$
Looking at the Figures, however, shows us that the data does not appear normal. Therefore, we cannot conclude that 5 is a big enough sample size for the Poisson Distribution of $\lambda = 1$ to be approximately normal.</p>
<h3>Part II</h3>
<p>Running the algorithm I described, I produced the following table</p>
<table>
<thead>
<tr>
<th>$\lambda$</th>
<th>Skewness</th>
<th>Sample Size Needed</th>
<th>Shapiro-Wilk P-Value</th>
<th>Average of Sample Means</th>
<th>Standard Deviation of Sample Means</th>
<th>Theoretical Standard Deviation of Sample Means</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.1</td>
<td>3.16228</td>
<td>2710</td>
<td>0.05778</td>
<td>0.099</td>
<td>0.0060</td>
<td>0.0061</td>
</tr>
<tr>
<td>0.5</td>
<td>1.41421</td>
<td>802</td>
<td>0.16840</td>
<td>0.499</td>
<td>0.0250</td>
<td>0.0249</td>
</tr>
<tr>
<td>1</td>
<td>1.00000</td>
<td>215</td>
<td>0.06479</td>
<td>1.000</td>
<td>0.0675</td>
<td>0.0682</td>
</tr>
<tr>
<td>5</td>
<td>0.44721</td>
<td>53</td>
<td>0.12550</td>
<td>4.997</td>
<td>0.3060</td>
<td>0.3071</td>
</tr>
<tr>
<td>10</td>
<td>0.31622</td>
<td>31</td>
<td>0.14120</td>
<td>9.999</td>
<td>0.5617</td>
<td>0.5679</td>
</tr>
<tr>
<td>50</td>
<td>0.14142</td>
<td>10</td>
<td>0.48440</td>
<td>50.03</td>
<td>2.2461</td>
<td>2.2361</td>
</tr>
<tr>
<td>100</td>
<td>0.10000</td>
<td>6</td>
<td>0.47230</td>
<td>100.0027</td>
<td>4.1245</td>
<td>4.0824</td>
</tr>
</tbody>
</table>
<p>The skewness was derived from the formula in the first section while the sample size was obtained by looking at the .95 blue quantile line in Figures XVIII-XIV. The rest of the columns are obtained from the output of the R Code function <code>show_results</code>.</p>
<p>Looking at the histograms and Q-Q Plots produced by the algorithm, the distribution of sample means distributions are all roughly symmetric. The sample means are also tightly clustered around the Q-Q line, showing that the normal distribution is a good fit. This allows us to be confident that using these values of <code>n</code> as the sample size would result in the distribution of sample means of Uniform or Poisson (with a given lambda) to be approximately normal.</p>
<p>All the values of the average sampling means are within 0.001 of the theoretical average of sample means. The standard deviation of sample means slightly increase as the value of $\lambda$ increases, but it still is quite low.</p>
<h2>Conclusion</h2>
<p>The table in the results section clearly show that as the skewness increases, so does the sample size needed to make the distribution of sample means approximately normal. This shows the central limit theorem in action in that no matter the skewness, if you obtain a large enough sample, the distribution of sample means will be approximately normal.</p>
<p>These conclusions pave the way for more interesting applications such as hypothesis testing and confidence intervals.</p>
<h2>Appendix</h2>
<h3>Figures</h3>
<h4>Figure I, Histogram of Sample Means coming from a Uniform Distribution with sample size of 5</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/part1histunif.png" alt="part1histunif" /></p>
<h4>Figure II, Q-Q Plot of Sample Means coming from a Uniform Distribution with sample size of 5</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/part1qunif.png" alt="part1qunif" /></p>
<h4>Figure III, Histogram of Sample Means coming from a Poisson Distribution with $\lambda = 1$ and sample size of 5</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/part1histpoisson.png" alt="part1histpoisson" /></p>
<h4>Figure IV, Q-Q Plot of Sample Means coming from Poisson Distribution with $\lambda = 1$ and sample size of 5</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/part1qpoisson.png" alt="part1qpoisson" /></p>
<h4>Figure V, Histogram of Sample Means coming from Poisson Distribution with $\lambda = 0.1$ and sample size of 2710</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson01.png" alt="histpoisson01" /></p>
<h4>Figure VI, Q-Q Plot of Sample Means coming from Poisson Distribution with $\lambda = 0.1$ and sample size of 2710</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson01.png" alt="qpoisson01" /></p>
<h4>Figure VII, Histogram of Sample Means coming from Poisson Distribution with $\lambda = 0.5$ and sample size of 516</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson05.png" alt="histpoisson05" /></p>
<h4>Figure VII, Q-Q Plot of Sample Means coming from Poisson Distribution with $\lambda = 0.5$ and sample size of 516</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson05.png" alt="qpoisson05" /></p>
<h4>Figure VIII, Histogram of Sample Means coming from Poisson Distribution with $\lambda = 1$ and sample size of 215</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson1.png" alt="histpoisson1" /></p>
<h4>Figure IX, Q-Q Plot of Sample Means coming from Poisson Distribution with $\lambda = 1$ and sample size of 215</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson1.png" alt="qpoisson1" /></p>
<h4>Figure X, Histogram of Sample Means coming from Poisson Distribution of $\lambda = 5$ and sample size of 53</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson5.png" alt="histpoisson5" /></p>
<h4>Figure XI, Q-Q Plot of Sample Means coming from Poisson Distribution of $\lambda = 5$ and sample size of 53</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson5.png" alt="qpoisson5" /></p>
<h4>Figure XII, Histogram of Sample Means coming from Poisson Distribution of $\lambda = 10$ and sample size of 31</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson10.png" alt="histpoisson10" /></p>
<h4>Figure XIII, Q-Q Plot of Sample Means coming from Poisson Distribution of $\lambda = 10$ and sample size of 31</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson10.png" alt="qpoisson10" /></p>
<h4>Figure XIV, Histogram of Sample Means coming from Poisson Distribution of $\lambda = 50$ and sample size of 10</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson50.png" alt="histpoisson50" /></p>
<h4>Figure XV, Q-Q Plot of Sample Means coming from Poisson Distribution of $\lambda = 50$ and sample size of 10</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson50.png" alt="qpoisson50" /></p>
<h4>Figure XVI, Histogram of Sample Means coming from Poisson Distribution of $\lambda = 100$ and sample size of 6</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histpoisson100.png" alt="histpoisson100" /></p>
<h4>Figure XVII, Q-Q Plot of Sample Means coming from Poisson Distribution of $\lambda = 100$ and sample size of 6</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/qpoisson100.png" alt="qpoisson100" /></p>
<h4>Figure XVIII, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 0.1$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl01.png" alt="histl01" /></p>
<h4>Figure XIX, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 0.5$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl05.png" alt="histl05" /></p>
<h4>Figure XX, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 1$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl1.0.png" alt="histl1.0" /></p>
<h4>Figure XXI, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 5$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl5.png" alt="histl5" /></p>
<h4>Figure XXII, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 10$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl10.png" alt="histl10" /></p>
<h4>Figure XXIII, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 50$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl50.png" alt="histl50" /></p>
<h4>Figure XXIV, Histogram of sample size needed to fail to reject the normality test for Poisson Distribution of $\lambda = 100$</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab3/histl100.png" alt="histl100" /></p>
<h3>R Code</h3>
<pre><code class="language-R">rm(list=ls())
library(ggplot2)
sample_mean_uniform = function(n) {
xbarsunif = numeric(5000)
for (i in 1:5000) {
sumunif = 0
for (j in 1:n) {
sumunif = sumunif + runif(1, 0, 1)
}
xbarsunif[i] = sumunif / n
}
xbarsunif
}
sample_mean_poisson = function(n, lambda) {
xbarspois = numeric(5000)
for (i in 1:5000) {
sumpois = 0
for (j in 1:n) {
sumpois = sumpois + rpois(1, lambda)
}
xbarspois[i] = sumpois / n
}
xbarspois
}
poisson_n_to_approx_normal = function(lambda) {
print(paste("Looking at Lambda =", lambda))
ns = c()
# Speed up computation of lower lambda values by starting at a different sample size
# and/or lowering the precision by increasing the delta sample size
# and/or lowering the number of sample sizes we obtain from the shapiro loop
increaseBy = 1;
iter = 3;
startingValue = 2
if (lambda == 0.1) {
startingValue = 2000;
iter = 3;
increaseBy = 50;
} else if (lambda == 0.5) {
startingValue = 200;
iter = 5;
increaseBy = 10;
} else if (lambda == 1) {
startingValue = 150;
iter = 25;
} else if (lambda == 5) {
startingValue = 20;
iter = 50;
startingValue = 10;
} else if (lambda == 10) {
iter = 100;
} else {
iter = 500;
}
progressIter = 1
for (i in 1:iter) {
# Include a progress indicator for personal sanity
if (i / iter &gt; .1 * progressIter) {
print(paste("Progress", i / iter * 100, "% complete"))
progressIter = progressIter + 1
}
n = startingValue
dist = sample_mean_poisson(n, lambda)
p.value = shapiro.test(dist)$p.value
while (p.value &lt; 0.05) {
n = n + increaseBy
dist = sample_mean_poisson(n, lambda)
p.value = shapiro.test(dist)$p.value
# More sanity checks
if (n %% 10 == 0) {
print(paste("N =", n, " p.value =", p.value))
}
}
ns = c(ns, n)
}
print(ggplot(data.frame(ns), aes(x = ns)) +
geom_histogram(fill = 'white', color = 'black', bins = 10) +
geom_vline(xintercept = ceiling(quantile(ns, .95)), col = '#0000AA') +
ggtitle(paste("Histogram of N needed for Poisson distribution of lambda =", lambda)) +
xlab("N") +
ylab("Count") +
theme_bw())
ceiling(quantile(ns, .95)) #95% of the time, this value of n will give you a sampling distribution that is approximately normal
}
uniform_n_to_approx_normal = function() {
ns = c()
progressIter = 1
for (i in 1:500) {
# Include a progress indicator for personal sanity
if (i / 500 &gt; .1 * progressIter) {
print(paste("Progress", i / 5, "% complete"))
progressIter = progressIter + 1
}
n = 2
dist = sample_mean_uniform(n)
p.value = shapiro.test(dist)$p.value
while (p.value &lt; 0.05) {
n = n + 1
dist = sample_mean_uniform(n)
p.value = shapiro.test(dist)$p.value
if (n %% 10 == 0) {
print(paste("N =", n, " p.value =", p.value))
}
}
ns = c(ns, n)
}
print(ggplot(data.frame(ns), aes(x = ns)) +
geom_histogram(fill = 'white', color = 'black', bins = 10) +
geom_vline(xintercept = ceiling(quantile(ns, .95)), col = '#0000AA') +
ggtitle("Histogram of N needed for Uniform Distribution") +
xlab("N") +
ylab("Count") +
theme_bw())
ceiling(quantile(ns, .95)) #95% of the time, this value of n will give you a sampling distribution that is approximately normal
}
show_results = function(dist) {
print(paste("The mean of the sample mean distribution is:", mean(dist)))
print(paste("The standard deviation of the sample mean distribution is:", sd(dist)))
print(shapiro.test(dist))
print(ggplot(data.frame(dist), aes(x = dist)) +
geom_histogram(fill = 'white', color = 'black', bins = 20) +
ggtitle("Histogram of Sample Means") +
xlab("Mean") +
ylab("Count") +
theme_bw())
qqnorm(dist, pch = 1, col = '#001155', main = "QQ Plot", xlab = "Sample Data", ylab = "Theoretical Data")
qqline(dist, col="#AA0000", lty=2)
}
## PART I
uniform_mean_dist = sample_mean_uniform(n = 5)
poisson_mean_dist = sample_mean_poisson(n = 5, lambda = 1)
show_results(uniform_mean_dist)
show_results(poisson_mean_dist)
## PART II
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 0.1");
n.01 = poisson_n_to_approx_normal(0.1)
show_results(sample_mean_poisson(n.01, 0.1))
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 0.5");
n.05 = poisson_n_to_approx_normal(0.5)
show_results(sample_mean_poisson(n.05, 0.5))
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 1");
n.1 = poisson_n_to_approx_normal(1)
show_results(sample_mean_poisson(n.1, 1))
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 5");
n.5 = poisson_n_to_approx_normal(5)
show_results(sample_mean_poisson(n.5, 5))
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 10");
n.10 = poisson_n_to_approx_normal(10)
show_results(sample_mean_poisson(n.10, 10))
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 50");
n.50 = poisson_n_to_approx_normal(50)
show_results(sample_mean_poisson(n.50, 50))
print("Starting Algorithm to Find Sample Size Needed for the Poisson Distribution of Lambda = 100");
n.100 = poisson_n_to_approx_normal(100)
show_results(sample_mean_poisson(n.100, 100))
print("Starting Algorithm to Find Sample Size Needed for the Uniform Distribution")
n.uniform = uniform_n_to_approx_normal()
show_results(sample_mean_uniform(n.uniform))</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,331 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Confidence Interval Lab</h1>
<p><strong>Written by Brandon Rozek</strong></p>
<h2>Introduction</h2>
<p>Confidence intervals expands the concept of a point estimation by giving a margin of error such that one can be confident that a certain percentage of the time the true parameter falls within that interval.</p>
<p>In this lab, we will look at confidence intervals for a mean. This lab focuses on a certain method of confidence intervals that depends on the distribution of sample means being Normal. We will show how the violation of this assumption impacts the probability that the true parameter falls within the interval.</p>
<h2>Methods</h2>
<p>The observed level of confidence tells us the proportion of times the true mean falls within a confidence interval. To show how the violation of the Normality assumption affects this, we will sample from both a Normal distribution, T distribution, and exponential distribution with different sample sizes.</p>
<p>The normal and T distributions are sampled with a mean of 5 and a standard deviation of 2. The exponential deviation is sampled with a lambda of 2 or mean of 0.5.</p>
<p>From the samples, we obtain the mean and the upper/lower bounds of the confidence interval. This is performed 100,000 times. That way we obtain a distribution of these statistics.</p>
<p>We know that a confidence interval is valid, if the lower bound is no more than the true mean and the upper bound is no less than the true mean. From this definition, we can compute a proportion of observed confidence from the simulations</p>
<h3>Visualizations</h3>
<p>From the distributions of statistics, we can create visualizations to support the understanding of confidence intervals.</p>
<p>The first one is a scatterplot of lower bounds vs upper bounds. This plot demonstrates the valid confidence intervals in blue and the invalid ones in red. It demonstrates how confidence intervals that are invalid are not located inside the box.</p>
<p>The second visualization is a histogram of all the sample means collected. The sample means that didn't belong to a valid confidence interval are shaded in red. This graphic helps demonstrate the type I errors on both sides of the distribution. </p>
<p>In this lab, we're interested in seeing how our observed level of confidence differs from our theoretical level of confidence (95%) when different sample sizes and distributions are applied.</p>
<h2>Results</h2>
<p>We can see from the table section in the Appendix that sampling from a Normal or t distribution does not adversely affect our observed level of confidence. The observed level of confidence varies slightly from the theoretical level of confidence of 0.95.</p>
<p>When sampling from the exponential distribution, however, the observed level of confidence highly depends upon the sample size.</p>
<p>Looking at Table III, we can see that for a sample size of 10, the observed level of confidence is at a meager 90%. This is 5% off from our theoretical level of confidence. This shows how the normality assumption is vital to the precision of our estimate. </p>
<p>This comes from the fact that using this type of confidence interval on a mean from a non-normal distribution requires a large sample size for the central limit theorem to take affect.</p>
<p>The central limit theorem states that if the sample size is &quot;large&quot;, the distribution of sample means approach the normal distribution. You can see how in Figure XVIII, the distribution of sample means is skewed, though as the sample size increases, the distribution of sample means become more symmetric (Figure XIX).</p>
<h2>Conclusion</h2>
<p>From this, we can conclude that violating the underlying assumption of normality decreases the observed level of confidence. We can mitigate the decrease of the observed level of confidence when sampling means from a non-normal distribution by having a larger sample size. This is due to the central limit theorem.</p>
<h2>Appendix</h2>
<h3>Tables</h3>
<h4>Table I. Sampling from Normal</h4>
<table>
<thead>
<tr>
<th>Sample Size</th>
<th>Proportion of Means Within CI</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>0.94849</td>
</tr>
<tr>
<td>20</td>
<td>0.94913</td>
</tr>
<tr>
<td>50</td>
<td>0.95045</td>
</tr>
<tr>
<td>100</td>
<td>0.94955</td>
</tr>
</tbody>
</table>
<h4>Table II. Sampling from T Distribution</h4>
<table>
<thead>
<tr>
<th>Sample Size</th>
<th>Proportion of Means Within CI</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>0.94966</td>
</tr>
<tr>
<td>20</td>
<td>0.94983</td>
</tr>
<tr>
<td>50</td>
<td>0.94932</td>
</tr>
<tr>
<td>100</td>
<td>0.94999</td>
</tr>
</tbody>
</table>
<h4>Table III. Sampling from Exponential Distribution</h4>
<table>
<thead>
<tr>
<th>Sample Size</th>
<th>Proportion of Means Within CI</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>0.89934</td>
</tr>
<tr>
<td>20</td>
<td>0.91829</td>
</tr>
<tr>
<td>50</td>
<td>0.93505</td>
</tr>
<tr>
<td>100</td>
<td>0.94172</td>
</tr>
</tbody>
</table>
<h3>Figures</h3>
<h4>Normal Distribution</h4>
<h5>Figure I. Scatterplot of Bounds for Normal Distribution of Sample Size 10</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal10scatter.png" alt="normal10scatter" /></p>
<h5>Figure II. Histogram of Sample Means for Normal Distribution of Sample Size 10</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal10hist.png" alt="normal10hist" /></p>
<h5>Figure III. Scatterplot of Bounds for Normal Distribution of Sample Size 20</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal20scatterplot.png" alt="normal20scatterplot" /></p>
<h5>Figure IV. Histogram of Sample Means for Normal Distribution of Sample Size 20</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal20hist.png" alt="normal20hist" /></p>
<h5>Figure VScatterplot of Bounds for Normal Distribution of Sample Size 50</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal50scatterplot.png" alt="normal50scatterplot" /></p>
<h5>Figure VI. Histogram of Sample Means for Normal Distribution of Sample Size 50</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal50hist.png" alt="normal50hist" /></p>
<h5>Figure VII. Scatterplot of Bounds for Normal Distribution of Sample Size 100</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal100scatterplot.png" alt="normal100scatterplot" /></p>
<h5>Figure VIII. Histogram of Sample Means for Normal Distribution of Sample Size 100</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/normal100hist.png" alt="normal100hist" /></p>
<h4>T Distribution</h4>
<h5>Figure IX. Scatterplot of Bounds for T Distribution of Sample Size 10</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t10scatterplot.png" alt="t10scatterplot" /></p>
<h5>Figure X. Histogram of Sample Means for T Distribution of Sample Size 10</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t10hist.png" alt="t10hist" /></p>
<h5>Figure XI. Scatterplot of Bounds for T Distribution of Sample Size 20</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t20scatterplot.png" alt="t20scatterplot" /></p>
<h5>Figure XII. Histogram of Sample Means for T Distribution of Sample Size 20</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t20hist.png" alt="t20hist" /></p>
<h5>Figure XIII. Scatterplot of Bounds for T Distribution of Sample Size 50</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t50scatter.png" alt="t50scatter" /></p>
<h5>Figure XIV. Histogram of Sample Means for T Distribution of Sample Size 50</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t50hist.png" alt="t50hist" /></p>
<h5>Figure XV. Scatterplot of Bounds for T Distribution of Sample Size 100</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t100scatter.png" alt="t100scatter" /></p>
<h5>Figure XVI. Histogram of Sample Means for T Distribution of Sample Size 100</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/t100hist.png" alt="t100hist" /></p>
<h4>Exponential Distribution</h4>
<h5>Figure XVII. Scatterplot of Bounds for Exponential Distribution of Sample Size 10</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp10scatter.png" alt="exp10scatter" /></p>
<h5>Figure XVIII. Histogram of Sample Means for Exponential Distribution of Sample Size 10</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp10hist.png" alt="exp10hist" /></p>
<h5>Figure XIX. Scatterplot of Bounds for Exponential Distribution of Sample Size 20</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp20scatter.png" alt="exp20scatter" /></p>
<h5>Figure XX. Histogram of Sample Means for Exponential Distribution of Sample Size 20</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp20hist.png" alt="exp20hist" /></p>
<h5>Figure XXI. Scatterplot of Bounds for Exponential Distribution of Sample Size 50</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp50scatter.png" alt="exp50scatter" /></p>
<h5>Figure XXII. Histogram of Sample Means for Exponential Distribution of Sample Size 50</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp50hist.png" alt="exp50hist" /></p>
<h5>Figure XXIII. Scatterplot of Bounds for Exponential Distribution of Sample Size 100</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp100scatter.png" alt="exp100scatter" /></p>
<h5>Figure XXIV. Histogram of Sample Means for Exponential Distribution of Sample Size 100</h5>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/statlab4/exp100hist.png" alt="exp100hist" /></p>
<h3>R Code</h3>
<pre><code class="language-R">rm(list=ls())
library(ggplot2)
library(functional) # For function currying
proportion_in_CI = function(n, mu, dist) {
# Preallocate vectors
lower_bound = numeric(100000)
upper_bound = numeric(100000)
means = numeric(100000)
number_within_CI = 0
ME = 1.96 * 2 / sqrt(n) ## Normal Margin of Error
for (i in 1:100000) {
x = numeric(n)
# Sample from distribution
if (dist == "Normal" | dist == "t") {
x = rnorm(n,mu,2)
} else if (dist == "Exponential") {
x = rexp(n, 1 / mu)
}
## Correct ME if non-normal
if (dist != "Normal") {
ME = qt(0.975,n-1)*sd(x)/sqrt(n)
}
## Store statistics
means[i] = mean(x)
lower_bound[i] = mean(x) - ME
upper_bound[i] = mean(x) + ME
# Is Confidence Interval Valid?
if (lower_bound[i] &lt; mu &amp; upper_bound[i] &gt; mu) {
number_within_CI = number_within_CI + 1
}
}
# Prepare for plotting
lbub = data.frame(lower_bound, upper_bound, means)
lbub$col = ifelse(lbub$lower_bound &lt; mu &amp; lbub$upper_bound &gt; mu, 'Within CI', 'Outside CI')
print(ggplot(lbub, aes(x = lower_bound, y = upper_bound, col = col)) +
geom_point(pch = 1) +
geom_hline(yintercept = mu, col = '#000055') +
geom_vline(xintercept = mu, col = '#000055') +
ggtitle(paste("Plot of Lower Bounds vs Upper Bounds with Sample Size of ", n)) +
xlab("Lower Bound") +
ylab("Upper Bounds") +
theme_bw()
)
print(ggplot(lbub, aes(x = means, fill = col)) +
geom_histogram(color = 'black') +
ggtitle(paste("Histogram of Sample Means with Sample Size of ", n)) +
xlab("Sample Mean") +
ylab("Count") +
theme_bw()
)
# Return proportion within CI
number_within_CI / 100000
}
sample_sizes = c(10, 20, 50, 100)
### PART I
proportion_in_CI_Normal = Curry(proportion_in_CI, dist = "Normal", mu = 5)
p_norm = sapply(sample_sizes, proportion_in_CI_Normal)
sapply(p_norm, function(x) {
cat("The observed proportion of intervals containing mu is", x, "\n")
invisible(x)
})
### PART II
proportion_in_CI_T = Curry(proportion_in_CI, dist = "t", mu = 5)
p_t = sapply(sample_sizes, proportion_in_CI_T)
sapply(p_t, function(x) {
cat("The observed proportion of intervals containing mu is", x, "\n")
invisible(x)
})
### PART III
proportion_in_CI_Exp = Curry(proportion_in_CI, dist = "Exponential", mu = 0.5)
p_exp = sapply(sample_sizes, proportion_in_CI_Exp)
sapply(p_exp, function(x) {
cat("The observed proportion of intervals containing mu is", x, "\n")
invisible(x)
})</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,465 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Random Number Generation</h1>
<h2>Introduction</h2>
<p>The generation of random numbers have a variety of applications including but not limited to the modeling of stochastic processes. It is important, therefore, to be able to generate any random number following a given distribution. One of the many ways to achieve this is to transform numbers sampled from a random uniform distribution. </p>
<p>In this lab, we will compare the effectiveness between the linear congruence method (LCM), <code>runif</code>, and <code>rexp</code> to generate random numbers. The programming language R will be used and different plots and summary statistics are drawn upon to analyze the effectiveness of the methods.</p>
<h2>Methods</h2>
<h3>The Linear Congruential Method</h3>
<p>The linear congruential method (LCM) is an algorithm that produces a sequence of pseudo-random numbers using the following recursive function
$$
X_{n + 1} = (aX_n + c) \mod m
$$
The R code we use has a <code>c</code> value of 0 which is a special case known as the multiplicative congruential generator (MCG).</p>
<p>There are several conditions for using a MCG. First, the seed value must be co-prime to <code>m</code>. In other words, the greatest common denominator between the two must be 1. A function was written in R to check this. Secondly, <code>m</code> must be a prime number or a power of a prime number. </p>
<h4>Periodicity</h4>
<p>An element of periodicity comes into play when dealing with pseudo-random number generators. Once the generator has produced a certain number of terms, it is shown to loop back to the first term of the sequence. It is advantageous, therefore, to keep the period high. </p>
<p>The period in a MCG is at most <code>m - 1</code>. In this lab, <code>m</code> has a value of $2^{31}$ and only 100 numbers were sampled from the LCM. This allows us to avoid the issue of periodicity entirely in our analysis.</p>
<h3>Mersenne-Twister Method</h3>
<p>The default pseudo-random number generator (pseudo RNG) in R is the Mersenne-Twister with the default seed value of the current time and the process id of the current R instance. Mersenne-Twister is part of a class of pseudo RNG called the generalized feedback register. This class of pseudo RNGs provide a very long period (VLP) or in other words, a long cycle before the values start repeating. VLPs are often useful when executing large simulations in R.</p>
<p>Since this method is already coded in the <code>base</code> R-package, we will leave the implementation for the curious to research.</p>
<h3>The Uniform Distribution</h3>
<h4>Probability Mass Function</h4>
<p>The uniform distribution function $Unif(\theta_1, \theta_2)$ has a probability mass function (PMF) of the following.
$$
f(x) = \frac{1}{\theta_2 - \theta_1}
$$
Where x is valid between [$\theta_1$, $\theta_2$].</p>
<p>In our case, we are producing numbers from [0,1]. We can therefore reduce the probability mass function to the following
$$
f(x) = 1
$$</p>
<h4>Expected Value</h4>
<p>The expected value can be defined as
$$
E(X) = \int_a^b xf(x)dx
$$
For the uniform distribution we're using that becomes
$$
E(X) = \int_0^1 xdx = \frac{1}{2}[x^2]_0^1 = \frac{1}{2}
$$
We should, therefore, expect the mean of the generated random number sequence to roughly equal $\frac{1}{2}$. </p>
<h4>Median</h4>
<p>To find the median of the distribution, we need to find the point at which the cumulative density function (CDF) is equal to .5.
$$
\int_0^{Median(X)} f(x)dx = \frac{1}{2}
$$
$$
\int_0^{Median(X)} dx = \frac{1}{2}
$$</p>
<p>$$
[X]_0^{Median(X)} = \frac{1}{2}
$$</p>
<p>$$
Median(X) = \frac{1}{2}
$$</p>
<p>This shows us that the median of the distribution should be roughly equal to $\frac{1}{2}$ as well.</p>
<h4>Analysis of Uniform Distribution Fit</h4>
<p>The histogram of a uniform distribution of data should look rectangular in shape. This means that the counts of all the sub-intervals should be about the same.</p>
<p>Another way to test whether or not the distribution is a good fit is to use what is called a Quantile-Quantile plot (Q-Q Plot). This is a probability plot that compares the quantiles from the theoretical distribution, this is the uniform distribution, to that of the sample.</p>
<p>For a precise Q-Q Plot, we need a lot of quantiles to compare. In this lab, we will compare 100 different quantiles. The quantiles from the theoretical distribution can be easily derived from the fact that all value ranges are equally probable. The theoretical quantiles in this case is 0.01, 0.02, ..., 0.98, 0.99, 1. The sample quantiles are obtianed by receiving 100 observations from <code>runif</code> or the LCM. </p>
<p>In the Q-Q plot, a good fit is shown when the points hug closely to the Q-Q line. It is also important to have symmetry in the points. This means that the points are ending on opposite sides of the Q-Q line.</p>
<p>For sake of exploration, we will use 5 different seed values for the linear congruential method (while making sure that the seed values are co-prime). We will then use the results of these and compare LCM to how the standard <code>runif</code> method generates random numbers.</p>
<h3>Exponential Distribution</h3>
<h4>Probability Mass Function and the Cumulative Density Function</h4>
<p>The exponential distribution is a type of continuous distribution that is defined by the following PMF
$$
f(x) = \lambda e^{-\lambda t}
$$
We can find the CDF by taking the integral of the PMF.
$$
F(x) = \int f(x)dx = \lambda \int e^{-\lambda x} dx = \lambda * (\frac{-1}{\lambda}) e^{-\lambda x} + C = -e^{-\lambda x} + C
$$
One of the conditions for the cumulative density function is that as $x \to \infty$, $F(x) \to 1$.
$$
1 = \lim<em>{x \to \infty} (-e^{-\lambda x} + C) = \lim</em>{x \to \infty} (-e^{-\lambda x}) + lim<em>{x \to \infty} ( C) = \lim</em>{x \to \infty}(-e^{\lambda x}) + C
$$
This shows that $C = 1$, since $lim_{x \to \infty} (-e^{-\lambda x})$ is equal to 0. </p>
<p>Substituting $C$ gives us the following.
$$
F(x) = 1 - e^{-\lambda x}
$$</p>
<h4>Expected Value</h4>
<p>We can find the expected value using the formula described in the previous Expected Value section. Reflected in the bounds of integration is the domain of the exponential distribution $[0, \infty)$.
$$
E(X) = \lim_{a \to \infty}\int_0^a x \lambda e^{-\lambda x} dx
$$
The integral above features a product of two functions. So as an aside, we will derive a formula so we can take the integral above.</p>
<p>The total derivative is defined as
$$
d(uv) = udv + vdu
$$</p>
<p>After taking the integral of both sides, we can solve for a formula that gives the following
$$
\int d(uv) = \int udv + \int vdu
$$</p>
<p>$$
\int udv = uv - \int vdu
$$</p>
<p>The formula above is called <em>Integration by Parts</em>. We will make use of this formula by defining $u = \lambda x$ and $dv = e^{-\lambda x} dx$</p>
<p>This implies that $du = \lambda dx$ and $v= -\frac{1}{\lambda}e^{-\lambda x}$<br />
$$
E(X) = -\lim_{a \to \infty}[\lambda x \frac{1}{\lambda}e^{-\lambda x}]<em>0^a - \lim</em>{b \to \infty}\int_0^b -\frac{1}{\lambda}e^{-\lambda x}\lambda dx
$$</p>
<p>$$
E(X) = -\lim_{a \to \infty} [xe^{-\lambda x}]<em>0^a - \lim</em>{b \to \infty}\int_0^b -e^{-\lambda x}dx
$$</p>
<p>$$
E(X) = -\lim<em>{a \to \infty}(ae^{-\lambda a}) - \lim</em>{b \to \infty}[\frac{1}{\lambda}e^{-\lambda x}]_0^b
$$</p>
<p>$$
E(X) = 0 - \frac{1}{\lambda}[\lim_{b \to \infty}(e^{-\lambda b}) - e^{-0\lambda}]
$$</p>
<p>$$
E(X) = -\frac{1}{\lambda}(-1) = \frac{1}{\lambda}
$$</p>
<p>For the purposes of this lab, we will make the rate ($\lambda$) equal to 3. Which means we should expect our mean to be roughly equal to $\frac{1}{3}$.</p>
<h4>Median</h4>
<p>The median can be found by setting the CDF equal to $\frac{1}{2}$
$$
1- e^{-\lambda Median(X)} = \frac{1}{2}
$$</p>
<p>$$
e^{-\lambda Median(X)} = \frac{1}{2}
$$</p>
<p>$$
-\lambda Median(X) = \ln(\frac{1}{2})
$$</p>
<p>$$
Median(X) = \frac{\ln(2)}{\lambda}
$$</p>
<p>This shows that we should expect our median to be around $\frac{\ln(2)}{3} \approx 0.231$.</p>
<h3>Inverse Transform Sampling</h3>
<p>Once we have a uniform distribution of values, we can then use these values to map to an exponential distribution. This is done through a technique called inverse transform sampling, the technique works as follows:</p>
<ol>
<li>Generate a random number u from the standard uniform distribution</li>
<li>Compute the value of X such that F(X) = u</li>
<li>The value of X belongs to the distribution of F</li>
</ol>
<p>Using these steps, we'll derive the exponential transform below.
$$
F(X) = 1 - e^{-\lambda x} = u
$$
Then proceeding to solve for X, we obtain the following.
$$
e^{-\lambda X} = 1 - u
$$</p>
<p>$$
-\lambda X = \ln(1 - u)
$$</p>
<p>$$
X = \frac{-\ln(1 - u)}{\lambda}
$$</p>
<p>With this formula, we can now find values for X which is a random variable that follows an exponential distribution given random uniform data $u$.</p>
<p>In this lab, we are looking at the exponential distribution with a rate of 3. Therefore, the probability mass function, the cumulative distribution function, and the exponential transform can be redefined as these respectively.
$$
f(x) = 3e^{-3x}
$$</p>
<p>$$
F(x) = 1 - e^{-3x}
$$</p>
<p>$$
X = \frac{-\ln(1 - u)}{3}
$$</p>
<h3>R Code</h3>
<p>The R code makes use of the concepts above. The purpose of this code is to output the summary statistics, histograms, and Q-Q plots are used to compare the different methods.</p>
<p>First the LCM is executed four times with three different seed values. The LCM is used to create a uniform distribution of data that is then compared to the standard <code>runif</code> function in the R language.</p>
<p>Then, transformations of a LCM, <code>runif</code>, are executed and compared with the standard <code>rexp</code> to create an exponential distribution of data with $\lambda = 3$. </p>
<h2>Results</h2>
<h3>Uniform Distribution</h3>
<p>For a small sample of 100 values, the data seems evenly distributed for all seeds and methods used. The peaks and troughs are more pronounced in the LCM methods suggesting that the <code>runif</code> command creates more evenly distributed data than the LCM. </p>
<p>Deviations from the mean and median are lowest for the LCM rather than the standard <code>runif</code> command with the exception of the LCM with the seed of 93463.</p>
<p>The Q-Q plots for all of the methods follow the Q-Q Line tightly and appear symmetric.</p>
<h3>Exponential Distribution</h3>
<p>A bin size of 20 is used to make the histograms easily comparable to each other. One interesting thing to note is that in Figure XI, there is an observation located far out from the rest of the data. For the purpose of a histogram, which is to show us the shape of the distribution, this one far observation skews what we can see. Therefore the next figure, Figure XII has that single outlier removed and shows the histogram of the rest of the data.</p>
<p>All of the histograms appear exponential in shape. Looking at the Q-Q plots, the LCM transformation and the <code>rexp</code> distribution of data fit closely to the QQ-Line. All of the Q-Q Plots have the points getting further away from the line as you get higher up in the percentiles. The <code>runif</code> transformation quantiles diverge from the line after about the 50th percentile.</p>
<p>Deviations from the mean and median were about the same for both transformed techniques (<code>runif</code> and LCM). The <code>rexp</code> command did better when it came to the deviation from the mean, but performed worse when it came to deviation from the median.</p>
<h2>Conclusion</h2>
<p>The linear congruential methods performed better when it came to simulating the distribution than the standard R functions. It more accurately captured the median for both the standard uniform data, and the exponential data. Against the <code>runif</code> command, it also performed better at simulating the mean.</p>
<p>This can especially be seen when comparing the two transform techniques. The transformed LCM distribution of data followed the Q-Q line more tightly than the transformed <code>runif</code>.</p>
<p>I speculate that this is due to the seed value used. The Mersenne-Twist method performs better when the seed value doesn't have many zeros in it. Since the seed value is determined by the system time and process id, we don't know for sure if it's a proper input for the Mersenne-Twist. For the LCM, seeds were properly tested to make sure that it was co-prime with one of its parameters. This condition allowed proper seeds to work well with the LCM. </p>
<p>Further research can be done on standardizing the seed values used across all the different pseudo random number generators and looking at the 6 other pseudo RNG that comes built into R. Changing the seed and random number generator can be achieved through the <code>set.seed</code> function.</p>
<h2>Appendix</h2>
<h3>Figures</h3>
<h4>Figure I, Histogram of LCM Random Data with seed 55555</h4>
<p><img src="file:///home/rozek/Pictures/stat381lab2/hist55555.png" alt="" /></p>
<h4>Figure II, Q-Q Plot of LCM Random Data with seed 55555</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/qqplot55555.png" alt="qqplot55555" /></p>
<h4>Figure III, Histogram of LCM Random Data with seed 93463</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/hist93463.png" alt="hist93463" /></p>
<h4>Figure IV, Q-Q Plot of LCM Random Data with seed 93463</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/q93463.png" alt="q93463" /></p>
<h4>Figure V, Histogram of LCM Random Data with seed 29345</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/hist29345.png" alt="hist29345" /></p>
<h4>Figure VI, Q-Q Plot of LCM Random Data with seed 29345</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/q29345.png" alt="q29345" /></p>
<h4>Figure VII, Histogram of LCM Random Data with seed 68237</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/hist68237.png" alt="hist68237" /></p>
<h4>Figure VIII, Q-Q Plot of LCM Random Data with seed 68237</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/q68237.png" alt="q68237" /></p>
<h4>Figure IX, Histogram of R Random Uniform Data</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/histunif.png" alt="histunif" /></p>
<h4>Figure X, Q-Q Plot of R Random Uniform Data</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/qunif.png" alt="qunif" /></p>
<h4>Figure XI, Histogram of Exponential Data from LCM Random</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/histexplcm.png" alt="histexplcm" /></p>
<h4>Figure XII, Histogram of Exponential Data from LCM Random without Outlier Above 2</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/histexplcm_nooutlier.png" alt="histexplcm_nooutlier" /></p>
<h4>Figure XIII, Q-Q Plot of Exponential Data from LCM Rnadom</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/qexplcm.png" alt="qexplcm" /></p>
<h4>Figure XIV, Histogram of Exponential Data from R Random Uniform</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/histexpunif.png" alt="histexpunif" /></p>
<h4>Figure XV, Q-Q Plot of Exponential Data from R Random Uniform</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/qexpunif.png" alt="qexpunif" /></p>
<h4>Figure XVI, Histogram of R Generated Exponential Data</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/histexp.png" alt="histexp" /></p>
<h4>Figure XVII, Q-Q Plot of R Generated Exponential Data</h4>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/stat381lab2/qexp.png" alt="qexp" /></p>
<h3>Tables</h3>
<h4>Table I, Uniform Distribution Sample Data</h4>
<table>
<thead>
<tr>
<th>Method</th>
<th>Mean ($\bar{x}$)</th>
<th>Median ($\tilde{x}$)</th>
<th>$\mu - \bar{x}$</th>
<th>$m - \tilde{x}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>LCM with seed 55555</td>
<td>0.505</td>
<td>0.521</td>
<td>-0.005</td>
<td>-0.021</td>
</tr>
<tr>
<td>LCM with seed 93463</td>
<td>0.452</td>
<td>0.402</td>
<td>0.048</td>
<td>0.098</td>
</tr>
<tr>
<td>LCM with seed 29345</td>
<td>0.520</td>
<td>0.502</td>
<td>-0.020</td>
<td>-0.002</td>
</tr>
<tr>
<td>LCM with seed 68237</td>
<td>0.489</td>
<td>0.517</td>
<td>0.011</td>
<td>-0.017</td>
</tr>
<tr>
<td>R Random Uniform</td>
<td>0.480</td>
<td>0.471</td>
<td>0.020</td>
<td>0.029</td>
</tr>
</tbody>
</table>
<h4>Table II, Exponential Distribution Sample Data</h4>
<table>
<thead>
<tr>
<th>Method</th>
<th>Mean</th>
<th>Median</th>
<th>$\mu - \bar{x}$</th>
<th>$m - \tilde{x}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>LCM Transform</td>
<td>0.380</td>
<td>0.246</td>
<td>-0.047</td>
<td>-0.015</td>
</tr>
<tr>
<td>RUnif Transform</td>
<td>0.283</td>
<td>0.218</td>
<td>0.050</td>
<td>0.013</td>
</tr>
<tr>
<td>R Exponential</td>
<td>0.363</td>
<td>0.278</td>
<td>-0.030</td>
<td>-0.047</td>
</tr>
</tbody>
</table>
<h3>R Code</h3>
<pre><code class="language-R">library(ggplot2)
linear_congruential = function(seed) {
a = 69069
c = 0
m = 2^31
x = numeric(100)
x[1] = seed
for (i in 2:100) {
x[i] = (a * x[i-1] + c) %% m
}
xnew = x / (max(x) + 1)
xnew
}
gcd = function(x,y) {
r = x %% y;
return(ifelse(r, gcd(y, r), y))
}
check_seed = function(seed) {
if (gcd(seed, 2^31) == 1) {
print(paste("The seed value of", seed, "is co-prime"))
} else {
print(paste("The seed value of", seed, "is NOT co-prime"))
}
}
check_data = function(data, distribution, title) {
print(paste("Currently looking at", title))
print(summary(data))
print(ggplot(data.frame(data), aes(x = data)) +
geom_histogram(fill = 'white', color = 'black', bins = 10) +
xlab("Date") +
ylab("Frequency") +
ggtitle(paste("Histogram of", title)) +
theme_bw())
uqs = (1:100) / 100
if (distribution == "Uniform") {
qqplot(data, uqs, pch = 1, col = '#001155', main = paste("QQ Plot of", title), xlab = "Sample Data", ylab = "Theoretical Data")
qqline(uqs, distribution = qunif, col="red", lty=2)
} else if (distribution == "Exponential") {
eqs = -log(1-uqs) / 3
qqplot(data, eqs, pch = 1, col = '#001155', main = paste("QQ Plot of", title), xlab = "Sample Data", ylab = "Theoretical Data")
qqline(eqs, distribution=function(p) { qexp(p, rate = 3)}, col="#AA0000", lty=2)
}
}
seed1 = 55555
seed2 = 93463
seed3 = 29345
seed4 = 68237
check_seed(seed1)
lin_cong = linear_congruential(seed1)
check_data(lin_cong, "Uniform", paste("LCM Random Data with seed", seed1))
check_seed(seed2)
lin_cong2 = linear_congruential(seed2)
check_data(lin_cong2, "Uniform", paste("LCM Random Data with seed", seed2))
check_seed(seed3)
lin_cong3 = linear_congruential(seed3)
check_data(lin_cong3, "Uniform", paste("LCM Random Data with seed", seed3))
check_seed(seed4)
lin_cong4 = linear_congruential(seed4)
check_data(lin_cong4, "Uniform", paste("LCM Random Data with seed", seed4))
# Using the standard built-in R function
unif = runif(100, 0, 1)
check_data(unif, "Uniform", "R Random Uniform Data")
# Building exponential from linear congruence method
exp1 = -log(1 - lin_cong) / 3
check_data(exp1, "Exponential", "Exponential Data from LCM Random")
# Building exponential from runif
exp2 = -log(1 - unif) / 3
check_data(exp2, "Exponential", "Exponential Data from R Random Uniform")
# Building exponential from rexp
exp3 = rexp(100, rate = 3)
check_data(exp3, "Exponential", "R Generated Exponential Data")</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,90 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="Documents pertaining to courses I've taken">
<title>Courses | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem active">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Courses</h1>
<p>Below are the courses that I have documents up online for.</p>
<p><a href="index.html%3Fcourses%252FStat381.html">Probability and Statistical Inference I</a></p>
<h2>Coursera Courses</h2>
<p>Occassionally during breaks, I like to take some coursera courses. I will post my notes from the lecture here as well.</p>
<p><a href="index.html%3Fcourses%252FBayesianStatistics.html">Bayesian Statistics</a></p>
<p><a href="index.html%3Fcourses%252FReproducibleResearch.html">Reproducible Research</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,89 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="This is my University Tilda space">
<title>Home | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem active">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Brandon Rozek's Tilda Space</h1>
<h2>Welcome</h2>
<p>My main website is <a href="https://brandonrozek.com">https://brandonrozek.com</a>. Though I like the customization that Wordpress allows, it does come with additional overhead.</p>
<p>Therefore, I decided to use the Tilda space offered by the Computer Science department to host content pertaining to my Academic Life.</p>
<p>As such, my research and other class documents will appear here.</p>
<p>For the curious, this website was built with <a href="http://picocms.org">Pico CMS</a> using the <a href="https://github.com/lostkeys/Bits-and-Pieces-Theme-for-Pico">BitsAndPieces</a> theme.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,162 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Final Review December 6</h1>
<h2>Classes</h2>
<p>Here is how you can create a class called &quot;Employee&quot; with a non-default constructor (a constructor that takes parameters) and a getter and setter</p>
<pre><code class="language-java">public class Employee {
// Our private variables
private String name;
private double salary;
// Non-default constructor
public Employee(String name, double salary) {
this.name = name;
this.salarly = salary;
}
// This is a getter
public string getName() {
return name;
}
public double setSalarly(double salary) {
this.salary = salary;
}
}</code></pre>
<h2>For Loops + Arrays</h2>
<p>For loops are constructed in the following way</p>
<p><code>for (initialization; condition to stop; increment to get closer to condition to stop)</code></p>
<pre><code class="language-java">//Assume an array with variable name array is declared before
for (int i = 0; i &lt; array.length; i++) {
// This code will loop through every entry in the array
}</code></pre>
<p>Note that you don't always have to start from zero, you can start anywhere from the array.</p>
<h2>For Loops + Arrays + Methods</h2>
<p>This is an example of how you can take in an array in a method</p>
<pre><code class="language-java">public static boolean isEven(int[] array) { // &lt;-- Note the int[] array
for (int i = 0; i &lt; array.length; i++) { // Iterate through the entire array
// If you find an odd number, return false
if (array[i] % 2 == 1) {
return false;
}
}
// If you didn't find any odd numbers, return true
return true;
}</code></pre>
<h2>File I/O</h2>
<p>Let's say that you have the following file</p>
<pre><code>4
chicken
3
salad</code></pre>
<p>And you want to make it so that you take the number, and print the word after it a certain number of times. For this example we would want to see the following</p>
<pre><code class="language-java">chicken chicken chicken chicken
salad salad salad</code></pre>
<p>Here is the code to write it</p>
<pre><code class="language-java">public static void printStrings() {
FileInputStream file = new FileInputStream("stuff.txt"); // File contents are in stuff.txt
Scanner scnr = new Scanner(file); // Scanner takes in a file to read
while (scnr.hasNext()) { // While there are still stuff in the file to read
int number = scnr.nextInt(); // Grab the number
String word = scnr.next(); // Grab the word after the number
// Print the word number times
for (int i = 0; i &lt; number; i++) {
System.out.print(word);
}
// Put a new line here
System.out.println();
}
}</code></pre>
<h2>Recursion</h2>
<p>Look at handout and carefully trace recursion problems</p>
<h2>2D Arrays</h2>
<p>Declare a 2D int array with 4 rows and 7 columns</p>
<pre><code class="language-java">int[][] dataVals = new int[4][7];</code></pre>
<p>A 2D array with 4 rows and 7 columns has 7 * 4 = 28 entries.</p>
<p>If you want to sum up all the numbers in a 2 dimensional array, do the following</p>
<pre><code class="language-java">// Assume numbers is declared beforehand
int sum = 0;
for (int i = 0; i &lt; numbers.length; i++) { // Loop through every row in the 2D array
for (int j = 0; j &lt; numbers[i].length; j++) { // Loop through every column in a row
// This code now looks at one entry in a 2D array
sum += numbers[i][j];
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,124 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Exam Review Partial Answer Sheet</h1>
<p>Write a Java method to sum values of an array</p>
<pre><code class="language-java">// Assume variable numbers is an array of 10 integers
int sum = 0;
for (int i = 0; i &lt; numbers.length; i++) {
sum += numbers[i];
}</code></pre>
<p>Write a Java method to test if an array contains a specific value</p>
<pre><code class="language-java">// Assume variable numbers is an array of 10 integers
int numWanted = 4;
boolean found = false;
for (int i = 0; i &lt; numbers.length; i++) {
if (numbers[i] == numWanted) {
System.out.println("The number " + numWanted + " was found!");
found = true;
}
}
if (!found) {
System.out.println("The number " + numWanted + " was not found!");
}</code></pre>
<p>Write a Java method to find the index of an array element</p>
<pre><code class="language-java">// Assume variable numbers is an array of 10 integers
int numWanted = 4;
boolean found = false;
for (int i = 0; i &lt; numbers.length; i++) {
if (numbers[i] == numWanted) {
System.out.println("The index of number " + numWanted + " is " + i);
}
}</code></pre>
<p>How many lines will the following loop print when it is run?</p>
<pre><code class="language-java">int i = 0;
while (i &lt;= 6) {
System.out.println("i is now " + (i));
i++;
}</code></pre>
<pre><code>i is now 0
i is now 1
i is now 2
i is now 3
i is now 4
i is now 5
i is now 6</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,222 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for November 13</h1>
<h2>File IO (Cont.)</h2>
<p>Last class we talked about reading from files, we can also write to files.</p>
<h3>Import necessary libraries</h3>
<p>First you must import all of the necessary libraries</p>
<pre><code class="language-java">// To read
import java.util.Scanner;
import java.io.FileOutputStream;
// To Write
import java.io.FileReader;
import java.io.PrintWriter;
// For Exception Handling
import java.io.IOException;</code></pre>
<p>Then in your main, declare a <code>FileOutputStream</code> and <code>PrintWriter</code></p>
<pre><code class="language-java">FileOutputStream file;
PrintWriter print;</code></pre>
<h3>Try-Catch-Finally</h3>
<p>Create a try block to open a file for writing</p>
<pre><code class="language-java">try {
// If the file doesn't exist, it'll create it
file = new FileOutputStream("output.txt");
print = new PrintWriter(file);
} catch (IOException except) {
// Prints out the error message
System.out.println("File error " + except.getMessage());
} </code></pre>
<p>Adding a finally block allows the program to clean up before it closes</p>
<pre><code class="language-java"> try {
file = new FileOutputStream("output.txt");
print = new PrintWriter(file);
} catch (IOException except) {
System.out.println("File error " + except.getMessage());
} finally { // It starts here!
delete file;
delete print;
file.close();
return;
}</code></pre>
<h3>Write to the file :)</h3>
<p>Then you can write the to file!</p>
<pre><code class="language-java">// Do you notice the following methods?
print.println("Your number is");
print.print("My name is..\n");
print.printf("%s %d", "Hello ", 5);
print.flush(); //Clears the output stream
file.close(); //Closes the file
</code></pre>
<p>Extra Note: Disk fragmentation is a way of cleaning up memory that isn't being used by any of the code on your computer. </p>
<h2>Swing Graphics</h2>
<h3>Importing Necessary Libraries</h3>
<p>You need to import all the necessary libraries first</p>
<pre><code class="language-java">import javax.swing.*;
import java.awt.FlowLayout;
import java.awt.event.ActionListener;</code></pre>
<h3>Changing the class header</h3>
<p>Your class file needs to extend <code>JFrame</code> that way it can use a bunch of the already existent code written for Swing applications</p>
<pre><code class="language-java">public class firstGUi extends JFrame {
//....</code></pre>
<h3>Swing Components</h3>
<p>Java Swing makes use of what is called Swing Components. These are basic building blocks of GUI items you can add to the screen. For example, a checkbox, a radio button, text field. These are all swing components.</p>
<p>I wrote a blog post back in the summer which is an overview of them. You can check it out here: <a href="https://brandonrozek.com/2017/06/java-swing-components/">https://brandonrozek.com/2017/06/java-swing-components/</a></p>
<p>Inside your <code>firstGUI</code> class, declare some Swing components you would like to use in the program</p>
<pre><code class="language-java">public class firstGUI extends JFrame {
JButton button1;
JTextArea area;
JTextField text;
// ....
</code></pre>
<h3>Constructor</h3>
<p>You need to create a constructor for this class that way you can initiate all of the swing component values.</p>
<pre><code class="language-java">// ...
JButton button1;
JTextArea area;
JTextField text;
// Begin Constructor
firstGUI() {
// Define the components
JLabel name = new JLabel("Enter in your name:");
text = new JTextField("Jennifer", 20); // 20 characters long, default value: Jennifer
area = new JTextArea(10, 10); //Width and Height is 10 characters big
JScrollPane sp = new JScrollPane(area); //Adds a scroll bar for the text area
button1 = new JButton("Press Me");
// Set the Layout
// FlowLayout organizes each of the components top to bottom, left to right
setLayout(new FlowLayout());
// Add the components to the screen
add(name);
add(text);
add(sp); // never add the textarea when surrounded by a ScrollPane
add(button1);
}</code></pre>
<h3>New Main Method</h3>
<p>Finally, you need to create the Main method which will initiate it all</p>
<pre><code class="language-java">public static void main(String[] args) {
firstGUI myFrame = new firstGUI();
// End the program when the 'x' button (not keyboard) is pressed
myFrame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
myFrame.setTitle("My title"); // Titles the window
myFrame.pack(); // Packs it all into the frame
myFrame.setVisible(true); // Makes the frame appear on the screen
}</code></pre>
<h3>Making it interactive</h3>
<p>You need to change your class header to the following</p>
<pre><code class="language-java">public class firstGUI extends JFrame implements ActionListener {
// ...</code></pre>
<p>Then in your class, add the following method</p>
<pre><code class="language-java">@Override public void actionPerformed(ActionEvent event) {
// Do stuff as a result of an event here
area.append("You Pressed the Button");
}</code></pre>
<p>To make it actually activate as a result of an event. You need to attach it to a swing component.</p>
<p>For example, I want the code in <code>actionPerformed</code> to activate in result of a button press.</p>
<p>Add the following line to your code in the constructor.</p>
<pre><code class="language-java">//...
button1 = new JButton("Press Me");
button1.addActionListener(this); // New Code
//....</code></pre>
<h3>Identifying Button Pressed</h3>
<p>How do you know which button was pressed in the <code>actionPerformed</code> method?</p>
<p>You can use <code>event.getSource()</code> to find out.</p>
<p>Example:</p>
<pre><code class="language-java">@Override public void actionPerformed(ActionEvent event) {
if (event.getSource() == button1) { // Replace button1 with appropriate variable name
// Do things as a result of a specific button being pressed
}
}</code></pre>
<h3>Summary</h3>
<p>To use Swing, do the following steps</p>
<ol>
<li>Import Libraries</li>
<li>Declare J___ variables</li>
<li>New the J___ variables</li>
<li>Add the J___ variables to the frame</li>
<li>Add the <code>ActionListener</code> to the components you wish to monitor</li>
</ol>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,175 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture Notes for November 15th</h1>
<p>Import the necessary libraries</p>
<pre><code class="language-java">import java.awt.*;
import java.awt.event.*;
import javax.swing.*;</code></pre>
<p>This time we'll be using the null layout. This will require you to set the x, and y positions of each of the elements on the screen</p>
<p>First we'll make a class called <code>GUILayout</code> which will extend <code>JPanel</code> and contain our layout and components</p>
<pre><code class="language-java">import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
public class GUILayout extends JPanel {
private JButton one, two;
private JTextField text;
private JTextArea area;
private int count; // Used to keep track of button pushes
private int number = 0;
GUILayout() {
// Set the layout to the null layout
setLayout(null);
// Tells the computer that you wish to have the size of your program to be 500x500
setPreferredSize(new Dimension(500, 500));
// Create the buttons
one = new JButton("Change color");
two = new JButton("Count Pushes");
// Set the x, y, width, height inside parenthesis
// This places the buttons on the specified locations on the screen
one.setBounds(10, 10, 100, 24);
two.setBounds(120, 10, 100, 24);
// Sets the color of the text of button 1 to blue
one.setForeground(Color.BLUE);
// Add the components to the screen
add(one);
add(two);
text = new JTextField("Today is Wednesday, a photographer is here.");
text.setBounds(10, 40 ,480, 24);
add(text);
area = new JTextArea(20, 20);
// Wrap the text area into the scroll pane
// This adds the scroll bars (vertical/horizontal)
JScrollPane sp = new JScrollPane(area);
sp.setBounds(10, 75, 490, 400);
add(sp);
// Bind the Listener class to the button one
one.addActionListener(new Listener());
// Bind it to two and text
two.addActionListener(new Listener());
text.addActionListener(new Listener());
}
// Create the class for the listener
private class Listener implements ActionListener {
// Define what you want to occur after an event
public void actionPerformed(ActionEvent event) {
if (event.getSource() == one) {
} else if (event.getSource() == two) {
count++;
// \n is needed to prevent scrolling
area.append("You pressed the button " + count + " times!");
} else if (event.getSource() == text) {
// Grab the number inside the text area and set it to the variable number
number = Integer.paseInt(text.getText());
number += 10;
// add "" to number so that it converts number to a String
text.setText(number + "");
// Convert the number to a String
String temp = Integer.toString(number);
area.setText(temp);
}
}
}
}</code></pre>
<p>The main code is really short, you just need to extend <code>JFrame</code> and some short code seen last class to set it up.</p>
<pre><code class="language-java">public class mainTester extends JFrame {
public static void main(String[] args) {
JFrame frame = new JFrame("Sample GUI"); // Sets the title to "Sample GUI"
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
// Instantiate the GUILayout file you made before
frame.getContentPane().add(new GUILayout());
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,174 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for November 20</h1>
<h2>Adding a drawing panel to the GUI</h2>
<p>You can't put swing and graphics together, therefore you need to make a seperate JPanels for swing and graphics.</p>
<p>Add necessary libraries</p>
<pre><code class="language-java">import java.awt.*;
import java.awt.Graphics;
import java.awt.event.*;
import javax.swing.*;</code></pre>
<pre><code class="language-java">public class drawingWindow extends JPanel {
JTextField field;
JButton draw;
DrawingPanel drawingPanel;
public drawingWindow() {
// Each new component would be vertically stacked upon each other
setLayout(new BoxLayout(this, BoxLayout.Y_AXIS));
JPanel swingSTuff = new JPanel();
// Add things to the screen
draw = new JButton("Draw");
field = new JTextField();
swingStuff.add(field);
swingStuff.add(draw)
// Add the drawing panel onto the screen
drawingPanel = new DrawingPanel(200, 400);
add(drawingPanel);
// Activate the listener if the button was pressed
draw.addActionListener(new Listener());
}
// Add the listener to respond to events
private class listener implements ActionListener {
public void actionPerformed(ActionEvent event) {
if (event.getSource() == draw) {
drawingPanel.setFlag(1);
// Repaints the screen so the oval can appear
drawingPanel.repaint();
}
}
}
// Create the draw panel so we can add it to the screen
private class DrawingPanel extends JPanel {
private int width, height;
DrawingPanel(int width, int height) {
this.width = width;
this.height = height;
setPreferredSize(new Dimension(width, height));
}
public void setFlag(int flag) {
this.flag = flag;
}
public void paintComponent(Graphics g) {
super.paintComponent(g);
// Every time the flag is set, draw an oval at a random location and color
if (flag == 1) {
Random rand = new Random();
int x = rand.nextInt(width);
int y = rand.nextInt(height);
g.setColor(Color.RED);
g.fillOval(x, y, 20, 30);
}
}
}
}</code></pre>
<p>There are a myriad of different methods you can use. </p>
<pre><code class="language-java">// Assume width, height, y, x, etc are defined above
public void paintComponent(Graphics g) {
//....
g.dispose(); // Flushes the graphics buffer
}</code></pre>
<p>You have the traditional fill and draw methods. Fill creates the shape shaded in with a color. Draw creates an outline of the shape.</p>
<pre><code class="language-java">
// ...
g.fillRect(x ,y, width, height);
g.drawRect(x, y, width, height);
g.fillOval(x, y, width, height);
g.drawOval(x, y, width, height);
//g.drawPoly(parematers...);
//g.fillPoly(parameters...);
g.drawArc(x, y, width, height, startingAngle, sweepingAngle);
g.fillArc(x, y, width, height, startingAngle, sweepingAngle);</code></pre>
<p>You can also create complex shapes like a polygon. When adding points, you need to make sure you add them Clockwise or Counterclockwise (but NOT both)</p>
<pre><code class="language-java"> Polygon tri = new Polygon();
tri.addPoint(150, 10);
tri.addPoint(175, 100);
tri.addPoint(125, 100);
// Add points clockwise or counterclockwise (NOT BOTH)</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,257 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for November 27</h1>
<h2>Recursion</h2>
<p>When doing recursion, make sure not to use loops.</p>
<p>Recursion is when a function makes a function call to itself.</p>
<p>It consists of two parts:</p>
<ul>
<li>Base Case: This tell it when to stop</li>
<li>Smaller Caller Case: Helps you get closer to the base case</li>
</ul>
<p>You can have one or more base cases or caller cases.</p>
<p>To teach recursion, we'll start with a problem that should be written iteratively (with loops) but we'll show how to do it with recursion.</p>
<h3>Example: Counting Down</h3>
<pre><code class="language-java">void CountDown(int number) {
while (number &gt; 0) {
System.out.println(number + " ");
number = number - 1;
}
System.out.println("Blast Off")
}</code></pre>
<ol>
<li>How does this loop stop? -- Base Case</li>
<li>How does this loop change each time through? -- Smaller caller case</li>
</ol>
<p>Base Case: It stops when the number equals 0</p>
<pre><code class="language-java">// Base Case
if (number == 0) {
System.out.println("Blast Off");
return;
}</code></pre>
<p>Smaller Caller Case: It decreases number by 1</p>
<pre><code class="language-java">// Smaller Caller Case
System.out.print(number + " ");
countDownRecursive(number - 1);</code></pre>
<p>Put it together...</p>
<pre><code class="language-java">void countDownRecursive(int number) {
if (number == 0) {
System.out.println("Blast Off");
} else {
System.out.print(number + " ");
countDownRecursive(number - 1);
}
}</code></pre>
<p>Prints <code>10 9 8 7 6 5 4 3 2 1 Blast Off</code></p>
<h3>Stack Snapshots</h3>
<p>Every time you make a recursive call, it keeps track of all the local variables at the current function call and pushes it onto the stack.</p>
<p>That means if you make 10 recursive calls, you'll have 10 slots added onto the stack. If you want to return back to the beginning, you would need to return 10 times.</p>
<h3>Order Matters</h3>
<p>Whether you do the recursive step first or some other action, it completely changes the output. Look at the following example that's similar to <code>countDownRecursive</code>.</p>
<pre><code class="language-java">void countUpRecursive(int number) {
if (number == 0) {
System.out.println("Blast Off");
} else {
// Notice the swapping of the next two lines
countUpRecursive(number - 1);
System.out.print(number + " ");
}
}</code></pre>
<p>This would print out <code>Blast Off 1 2 3 4 5 6 7 8 9 10</code></p>
<h3>Example: Summing up to a number</h3>
<p>This would be our iterative solution</p>
<pre><code class="language-java">int sum(int number) {
int sum = 0;
while (number &gt; 0) {
sum += number;
number--;
}
return sum;
}</code></pre>
<p>How does this loop stop?</p>
<p> Same as before. Think about it, if the number you pass in is zero, what should be the result of sum? Zero. Since adding up from 0 to 0 is just 0.</p>
<pre><code class="language-java">if (number == 0) {
return 0;
}</code></pre>
<p>How does this loop change each time through?</p>
<p> You want to update your sum, so return the sum of the next call to the current sum.</p>
<pre><code class="language-java">return (number + sum(number - 1));</code></pre>
<p>Putting it all together</p>
<pre><code class="language-java">int sumRecursive(int number) {
if (number == 0) {
return 0;
} else {
return number + sumRecursive(number - 1);
}
}</code></pre>
<h3>Example: Linear Search</h3>
<p>How to do it iteratively.</p>
<pre><code class="language-java">void linearSearch(int[] array, int number) {
int i = 0;
while (i &lt; array.length &amp;&amp; number != array[i]) {
i++;
}
if (i == array.length) {
System.out.println("Not Found");
} else {
System.out.println("Found");
}
}</code></pre>
<p>Base Case: There are two base cases, when it reaches the end of the array and when it finds the number</p>
<pre><code class="language-java">if (array.length == i) {
System.out.println("Not Found");
} else (array[i] == number) {
System.out.println(number + " found at index " + i);
}</code></pre>
<p>Smaller Caller Case: Check the next element</p>
<pre><code class="language-java">linearSearchRecursion(number, array, i + 1);</code></pre>
<p>Putting it all together...</p>
<pre><code class="language-java">void linearSearchRecursion(int[] array, int number) {
if (array.length == i) {
System.out.println("Not Found");
} else if (array[i] == number) {
System.out.println(number + " found at index " + index);
} else {
linearSearchRecursion(number, array, i + 1);
}
}</code></pre>
<h2>Binary Search</h2>
<p>This is a much more efficient search than the linear search we have been doing. The only condition is that your array must be sorted beforehand.</p>
<p>A regular linear search <code>O(n)</code> -- Check at most all of the elements of the array.</p>
<p>Binary Search <code>O(log(n))</code> -- Checks at most <code>ceil(log(n))</code> elements of an array.</p>
<p>To demonstrate this with real numbers, let's take an array of 100 elements</p>
<ul>
<li>With linear search this will take at most 100 iterations</li>
<li>With binary search this will take at most <strong>7</strong>.</li>
</ul>
<p>Crazy right?</p>
<h3>Implementation</h3>
<p><strong>Iterative approach</strong></p>
<pre><code class="language-java">void binarySearch(int number, int[] array) {
int lower = 0;
int upper = array.length - 1;
int mid = (lower + upper) / 2
while (lower &lt;= upper &amp;&amp; array[mid] != number) {
mid = (lower + upper) / 2;
if (array[mid] &lt; number) {
lower = mid + 1;
} else if () {
upper = mid -1;
}
}
if (lower &gt; upper) {
System.out.println("Not Found");
} else {
System.out.println(number + " found at index " + mid);
}
}</code></pre>
<p><strong>Recursive Approach</strong></p>
<p>Base Case: There are two cases, you either found it, or you made it to the end of the array without finding it</p>
<pre><code class="language-java">if (lower &gt; upper) {
System.out.println("Not Found");
} else if (array[mid] == number) {
System.out.println(number + " found at index " + mid);
}</code></pre>
<p>Smaller Caller Case: Change the lower or upper bounds to cut the search in half</p>
<pre><code class="language-java">if (array[mid] &lt; number) {
lower = mid + 1;
binarySearchRecursive(number, array, lower, upper);
} else if (array[mid] &gt; number) {
upper = mid - 1;
binarySearchRecursive(number, array, lower, upper);
}</code></pre>
<p>Putting it together....</p>
<pre><code class="language-java">binarySearch(int number, int[] array, int lower, int upper) {
if (lower &gt; upper) {
System.out.println("Not Found");
} else if (array[mid] == number) {
System.out.println(number + " found at index " + mid);
}
else if (array[mid] &lt; number) {
lower = mid + 1;
binarySearchRecursive(number, array, lower, upper);
} else if (array[mid] &gt; number) {
upper = mid - 1;
binarySearchRecursive(number, array, lower, upper);
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,182 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>November 6th Lecture</h1>
<h2>Compare cont.</h2>
<p>Continuing from last week's topic, writing a function called <code>compare</code> in your custom class.</p>
<p>For example, for your project:</p>
<pre><code class="language-java">public bool compare(ResearchProject right, int type) {
if (type == 0) { // Is last lastname greater than right lastname alphabeticallly
if (lastname.compareTo(right.lastname) &gt; 0) {
return true;
} else {
return false;
}
} else if (type == 1) { // Implement a different type of comparison
}
}</code></pre>
<p>You can then use the sorting algorithms discussed previously to sort the ResearchProjects using the <code>compare</code> method.</p>
<h2>File IO</h2>
<p>First start by importing the required libraries. </p>
<pre><code class="language-java">import java.io.FileInputStream;
import java.io.IOException; // For error handling
import java.io.FileNotFoundException; // For error handling
import java.util.Scanner;</code></pre>
<p>Then inside your main method which throws an IOException (see example &quot;Reading the filename from the keyboard&quot;)</p>
<pre><code class="language-java">FileStream file;
Scanner in;
try { // Try something
file = new FileInputStream("test.txt");
in = new Scanner(file);
} catch (FileNotFoundException e) { // Catch IF it fails
System.out.println("File not found.");
in = new Scanner(System.in); // Read from the keyboard instead
}
// Read the file now....
String name = in.nextLine();</code></pre>
<p>Before we had linked the Scanner to the keyboard, now we're doing it to a file.</p>
<h3>Try-Catch</h3>
<p>In the code above you see what's called a try-catch block. That means that if something was to happen in the execution of the code. Instead of just crashing as we have been seeing normally. We can deal with the error. In the example above you saw that the program reads from the keyboard instead of the file</p>
<h3>Reading the filename from the keyboard</h3>
<pre><code class="language-java">public static void main(String[] args) throws IOException {
FileInputStream file;
Scanner in, scnr;
// Connect one scanner to the keyboard to get the filename
scnr = new Scanner(System.in);
System.out.print("Enter file name: ");
String filename = in.nextLine();
// Repeat code from previous example
FileStream file;
Scanner in;
try {
file = new FileInputStream(filename); // Only this line changed
in = new Scanner(file);
} catch (FileNotFoundException e) {
System.out.println("File not found.");
in = new Scanner(System.in);
}
String name = in.nextLine();
// ....
}</code></pre>
<p>The main throws the IOException since we don't deal with it in any of the catch blocks.</p>
<h3>Reading names of people from a file into an array</h3>
<p>Let's say we have a file with the following in it</p>
<pre><code>3
Paul
Peter
Mary</code></pre>
<p>3 is to indicate the number of names in th file.</p>
<p>Let's write code in order to read these names into an array!</p>
<pre><code class="language-java">import java.io.FileInputStream;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.util.Scanner;
public static void main(String[] args) throws IOException {
FileInputStream file;
Scanner in, scnr;
scnr = new Scanner(System.in);
System.out.print("Enter file name: ");
String filename = in.nextLine();
FileStream file;
Scanner in;
try {
file = new FileInputStream(filename); // Only this line changed
in = new Scanner(file);
} catch (FileNotFoundException e) {
System.out.println("File not found.");
in = new Scanner(System.in);
}
// For the size of the array, get the first number in the file
int size = in.nextInt();
String[] nameArray = new String[size];
for (int i = 0; i &lt; size; i++) {
namesArray[i] = in.nextLine();
}
// Now all the names in the file is in the namesArray
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,157 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture Notes October 11</h1>
<h2>Global Variables</h2>
<p>Global variables is where you don't declare a variable inside a method. This is generally not a recommended practice. It is recommended to declare variables inside methods so that it is easier to reuse code.</p>
<pre><code class="language-java">public class mainDriver {
public static int globalVariable = 5; // I am a global variable
public static void main(String[] args) {
int localVariable = 4; // I am a local variable
}
}</code></pre>
<h2>String Formatting</h2>
<p>You can format strings in java by using the <code>printf</code> method in System.out.</p>
<p>Format strings work by using placeholders starting with <code>%</code> to specify where to place the value of a variable.</p>
<p>General format of command</p>
<pre><code class="language-java">//System.out.printf(FormatString, variable1, variable2, ....)
String s = "The number is";
int x = 5;
System.out.printf("%s %d", s, x); // Prints "The number is 5"</code></pre>
<p>If you want to print out money, you can do it through the following</p>
<pre><code class="language-java">float tax = 0.45698;
System.out.printf("The tax is %.2f"); //prints "The tax is 0.46"</code></pre>
<h2>Floating point precision</h2>
<p>Due to how computers store non-integers, math can be non-precise after some mathematical operations. It is therefore advantageous to do operations on integers to the level of precision you want.</p>
<p>For example, instead of working in dollars, work in pennies since it's the smallest division we care about.</p>
<h3>ArrayList</h3>
<p>Standard arrays are static, meaning they have no ability to grow. Instead of doing the operations we described last class, in order to add something to an array. We can use something called an <code>ArrayList</code> instead.</p>
<p>The methods in the <code>ArrayList</code> library are useful abstractions to make the life of a programmer easier.</p>
<p>ArrayLists can also hold only one type. The type cannot be a primitive like a standard array. Instead it must be a class representing the desired type.</p>
<p>int -&gt; Integer</p>
<p>double -&gt; Double</p>
<p>char -&gt; Character</p>
<p>float -&gt; String</p>
<p>To declare and initialize an <code>ArrayList</code></p>
<pre><code class="language-java">import java.util.ArrayList;
public class ArrayListTest {
public static void main(String[] args) {
ArrayList&lt;Integer&gt; numbers = new ArrayList&lt;Integer&gt;();
ArrayList&lt;String&gt; names = new ArrayList&lt;String&gt;();
}
}</code></pre>
<p><code>ArrayList</code>s has a variety of different methods that can be used to access/manipulate it</p>
<ul>
<li>get</li>
<li>set</li>
<li>add</li>
<li>clone</li>
<li>clear</li>
<li>size</li>
</ul>
<p>If you want to add the numbers 1 through 10 into the <code>ArrayList</code> of numbers</p>
<pre><code class="language-java">for (int i = 1; i &lt; 11; i++) {
numbers.add(i);
}</code></pre>
<p>To print out the entire contents of the <code>ArrayList</code></p>
<pre><code class="language-java">for (int i = 0; i &lt; numbers.size(); i++) {
System.out.println(numbers.get(i));
}</code></pre>
<p>How can you replace a value?</p>
<pre><code class="language-java">int n = 5; // Make this the number you wish to replace
int i = 0;
while (i &lt; numbers.size() &amp;&amp; numbers.get(i) != n) {
i++;
}
if (i == numbers.size()) {
System.out.println(n + " not found.");
} else {
int r = 10; // Make this the value you want to replace n with
numbers.set(i, r);
}</code></pre>
<p>The <code>remove</code> method removes an item given an index. This shifts all the elements above the removed index down.</p>
<pre><code class="language-java">numbers.remove(3); // Removes the element in the third index</code></pre>
<p>The <code>add</code> method can also take an index. It pushes all the elements at the index specified up.</p>
<pre><code class="language-java">numbers.add(3, 5); // Adds the number 5 to the third index</code></pre>
<p>You can clone an array using the <code>clone</code> method</p>
<pre><code class="language-java">ArrayList&lt;Integer&gt; numbers2 = new ArrayList&lt;Integer&gt;();
numbers2.clone(numbers); // Clones numbers into numbers2</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,137 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>October 18th</h1>
<h2>ArrayLists in classes</h2>
<pre><code class="language-java">public class Numbers {
private ArrayList&lt;Integer&gt; used;
private ArrayList&lt;Integer&gt; unused;
numbers () {
// Debug: System.out.println("1. Constructor Entry Point");
used = new ArrayList&lt;Integer&gt;();
unused = new ArrayList&lt;Integer&gt;();
// Debug: System.out.println("2, Constructor Size of ArrayLists" + used.size() + " " + unused.size())
}
// Adds the numbers 1-5 into the used ArrayList
public void fillUsedArrayList() {
for (int i = 0; i &lt; 5; i++) {
used.add(i + 1);
}
}
// Move an item from the unused ArrayList to the used ArrayList
public int moveIt(int index) {
int temp = used.get(index);
unused.add(temp);
// Debug: System.out.println("Adding to used array:" + (i + 1));
used.remove(index);
return temp;
}
// The method below is created for debugging purposes
public void printBothArrayLists() {
// Print the used arraylist
System.out.println("Used ArrayList");
for (int i = 0; i &lt; used.size(); i++) {
System.out.println(used.get(i));
}
// Print the unused arraylist
System.out.println("Unused ArrayList");
for (int i = 0; i &lt; unused.size(); i ++) {
System.out.println(unused.get(i));
}
}
}</code></pre>
<p>Recall that you can compile the code above but you cannot run it. To run code, you must have a main method.</p>
<h2>NumberTester</h2>
<pre><code class="language-java">public class NumberTester {
public static void main(String[] args) {
Numbers list;
list = new Numbers();
list.fillUsedArrayList();
list.printBothArrayLists();
}
}</code></pre>
<h2>Difference between Array and ArrayList</h2>
<p>An Array is a <strong>static</strong> structure of contiguous memory of a single type.</p>
<p>An ArrayList is a <strong>dynamic</strong> structure of contiguous memory of a single type </p>
<p>To get the size of an array you use <code>.length</code></p>
<p>To get the size of an ArrayList you use <code>.size()</code></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,170 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture Notes Oct 2nd</h1>
<h2>Array</h2>
<p><code>array</code> is not a reserved word, it's a concept. Arrays are able to hold multiple values under one name of the same type.</p>
<p>For instance, you can have an array of integers.</p>
<p>Properties of an array</p>
<ul>
<li>Size (n)</li>
<li>index [0, n - 1]</li>
</ul>
<p>You can declare arrays by saying the type '[]' and the name of the array</p>
<pre><code class="language-java">int[] numbers;
double[] gpas;
float[] grades;</code></pre>
<p>Before you can use the array, you must <code>new</code> it</p>
<pre><code class="language-java">numbers = new int[10];</code></pre>
<p>Where 10 is the size of the array.</p>
<p>You can combine both the declaration and initialization</p>
<pre><code class="language-java">double[] points = new double[7];</code></pre>
<p>You can access individual elements of the array by using its index. Indexes start from zero</p>
<pre><code class="language-java">points[0] = 5.4; // The first element in the point array is 5.4</code></pre>
<p>The <code>.length</code> property of an array gives the size of the array</p>
<h2>For-Loops + Arrays</h2>
<p>You can print out each element in the array using a for loop</p>
<pre><code class="language-java">for (int i = 0; i &lt; numbers.length; i++) {
System.out.println(numbers[i]);
}</code></pre>
<p>You can ask a user to input a value to each element in the array</p>
<pre><code class="language-java">for (int i = 0; i &lt; points.length; i++) {
System.out.print("Enter a number: ");
points[i] = scnr.nextInt();
}</code></pre>
<h2>While-Loops + Arrays</h2>
<p>You can use a while loop to search the array</p>
<pre><code class="language-java">int i = 0;
int number = 5;
// While the index is within the array size and the number isn't found
while (i != number.length &amp;&amp; number != numbers[i]) {
i++
}
if (i == numbers.length) {
System.out.println(number + " was not found.")
} else {
System.out.println(number + " was found at index " + i)
}</code></pre>
<p>If you don't include the <code>i != number.length</code> you will obtain an <code>IndexOutOfBounds</code> error.</p>
<p>The example above is called a <em>Linear Search</em>. </p>
<p>Linear searches work on an unsorted and sorted arrays.</p>
<h2>Methods + Arrays</h2>
<p>You can pass an array into a method</p>
<pre><code class="language-java">public static void exampleMethod(int[] sample) {
// Do something
}
public static void main(String[] args) {
int[] s = new int[30];
exampleMethod(s);
}</code></pre>
<h2>Do-While Loops</h2>
<p>For-loops can run 0 or more times. If you want something to execute at least once. Use a do-while loop.</p>
<pre><code class="language-java">do {
// Code
} while (condition);</code></pre>
<p>For example, to search at least once and asking whether the user wants to search again</p>
<pre><code class="language-java">// Assume linearSearch and array are defined
char answer;
Scanner input = new Scanner(System.in);
do {
linearSearch(array, input);
System.out.print("Do you want to search again? (Y/N) ");
input.nextLine();
answer = input.next().charAt(0);
} while( answer != 'N');</code></pre>
<p>You can create any type of loop just by using a while loop.</p>
<h2>Example: Finding the Max</h2>
<p>You can find the max of an array using the following method</p>
<pre><code class="language-java">double max = arrayName[0];
for (int i = 1; i &lt; arrayName.length; i++) {
if (max &lt; arrayName[i]) {
max = arrayName[i];
}
}
System.out.println("The max is " + max);</code></pre>
<h2>Example: Summing up an array</h2>
<p>You can sum the array using the following method</p>
<pre><code class="language-java">double sum = 0;
for (int i = 0; i &lt; arrayName.length; i++) {
sum += arrayName[i];
}
System.out.println("The sum is " + sum);</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,125 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for October 23</h1>
<h2>Two-Dimensional Arrays</h2>
<p>You can think of a two dimensional array as a grid that is organized by rows then columns.</p>
<p>To declare a two dimensional array do the following</p>
<pre><code class="language-java">int[][] grid = new int[5][5]; // Declares a 2D array of 5 rows and 5 columns</code></pre>
<p>You can have as many dimensions as you want. For graphics a 3-dimensional array would render 3D points.</p>
<p>It doesn't have to be inherently visual though, you can have the n-dimensional array look at the interaction between n different variables. For example, the relationships to different questions in a survey.</p>
<p>Strings are essentially a char array with extra methods attached. We can imitate an array of strings with a 2D char array.</p>
<pre><code class="language-java">char[][] helloWorld = new char[5][5];
hello[0][0] = 'h';
hello[0][1] = 'e';
hello[0][2] = 'l';
hello[0][3] = 'l';
hello[0][4] = 'o';
hello[1][0] = 'w';
hello[1][1] = 'o';
hello[1][2] = 'r';
hello[1][3] = 'l';
hello[1][4] = 'd';
</code></pre>
<h2>Nested-For Loops</h2>
<p>To access the elements in a 2D array, you need to use a nested for-loop. </p>
<p>Here is how you print out the hello world example above</p>
<pre><code class="language-java">for (int row = 0; row &lt; helloWorld.length; row++) {
for (int col = 0; col &lt; helloWorld[row].length; col++) {
System.out.print(helloWorld[row][col]);
}
System.out.print(" ")
}</code></pre>
<p>The code above prints out &quot;hello world&quot;</p>
<h2>2D Arrays in methods</h2>
<p>You can write a get like method in the following way</p>
<pre><code class="language-java">public static void get(int[][] array, int row, int col) {
return array[row][col];
}</code></pre>
<p>Arrays in Java are pass by reference not pass by value. Meaning that if you change the array within the method then it will change outside the method.</p>
<pre><code class="language-java">public static void insert(int[][] array, int row, int col, int numToInsert) {
array[row][col] = numToInsert;
}</code></pre>
<p>To make it not be able to change the array inside the method, use the keyword <code>const</code> inside the method header. To code below will throw a compiler error.</p>
<pre><code class="language-java">public static void insert(const int[][] array, int row, int col, int numToInsert) {
array[row][col] = numToInsert; // This line will throw an errror
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,93 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture on Oct 25</h1>
<h2>2 Dimension Array of Objects</h2>
<p>You can not only do a two dimensional array of primitive types, but you can also do two dimensional arrays of objects/classes.</p>
<pre><code class="language-java">animalLocation[][] map;
map = new animalLocation[5][4];</code></pre>
<p>Since we are dealing with classes, you cannot use this array right away. The code above creates the space in memory to store the objects. To have the animalLocation objects in the array, you must <code>new</code> each instance of the object.</p>
<pre><code class="language-java">for (int i = 0; i &lt; map.length; i++) {
for (int j = 0; j &lt; map[i].length; j++) {
map[i][j] = new animalLocation();
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,153 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Oct 30 Lecture</h1>
<h2>Sorting</h2>
<h3>Bubble Sort</h3>
<p>These instructions are to sort in descending order, to sort in ascending order just negate the condition.</p>
<p>This sort is a series of iterations, for each iteration you go to the bottom of the array. Then compare if the value is greater than the element before it. If it is, then you </p>
<ol>
<li>Go to the bottom of the array</li>
<li>Compare the value to the one before it
<ol>
<li>If it's greater than the element before it -&gt; swap</li>
</ol></li>
<li>Move on to the value before it and repeat step 2.</li>
</ol>
<p>Once you go through an iteration, the last thing being compared is the greatest value of the entire array. That means you don't have to check it every time anymore. </p>
<p>Keep going through all the iterations until n, where n is the size of the array, iterations have been completed.</p>
<h3>Swapping Values</h3>
<p>If you try to swap variables by saying</p>
<pre><code class="language-java">y = x;
x = y;</code></pre>
<p>then you'll end up overwriting y's value with x and both variable would have x's value.</p>
<p>If you want to actually swap variables, you must create a temporary variable that saves y's value so that it can be properly assigned to x.</p>
<pre><code class="language-java">int temp;
temp = y;
y = x;
x = temp;</code></pre>
<h3>Implementation (Not Complete)</h3>
<pre><code class="language-java">// Each iteration
for (int j = 0; j &lt; array.length - 1; j++) {
// Each element in the list
for (int i = 0; i &lt; array.length - 1; i++) {
// Is the element greater than the one after it?
if (array[i] &gt; array[i + 1]) {
// Swap
int temp = array[i + 1];
array[i + 1] = array[i];
array[i] = temp;
}
}
}</code></pre>
<p>This here compares each of the values each time. If you remember before, I said that you don't have to compare the topmost each time.</p>
<h3>Implementation</h3>
<p>To change this, just change the second loop condition</p>
<pre><code class="language-java">// Each iteration
for (int j = 0; j &lt; array.length - 1; j++) {
// Each element in the list
for (int i = 0; i &lt; array.length - 1 - i; i++) { // Note this line
// Is the element greater than the one after it?
if (array[i] &gt; array[i + 1]) {
// Swap
int temp = array[i + 1];
array[i + 1] = array[i];
array[i] = temp;
}
}
}</code></pre>
<h2>Compare</h2>
<p>In Java, you can compare numbers, strings, and even your own customized objects. To compare your own customize object, you must write a method called <code>compare</code> in your class.</p>
<h3>To use your compare method in the sorting algorithm</h3>
<pre><code class="language-java">// Each iteration
for (int j = 0; j &lt; array.length - 1; j++) {
// Each element in the list
for (int i = 0; i &lt; array.length - 1 - i; i++) {
// Is the element greater than the one after it?
if (array[i].compare(array[i + 1])) { // Note this line
// Swap
int temp = array[i + 1];
array[i + 1] = array[i];
array[i] = temp;
}
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,98 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture on October 4th</h1>
<h2>Pass by Copy vs Pass by Reference</h2>
<h3>Pass by Copy</h3>
<p>When you pass a primitive type into a method (int, char, double, float, etc), it makes a copy of the value of the variable and brings it into the method</p>
<h3>Pass by Reference</h3>
<p>When you pass an array into a method (int[], char[], double[], etc[]), it passes in the reference of the variable into the method. In other words, you give the actual array into the method and allows the method to change it.</p>
<h3>What's the Implication?</h3>
<p>If you change the primitive in a method, it doesn't actually change the value of the variable.</p>
<p>If you pass in an array and change it in the method, it has been permanently changed outside the method as well.</p>
<h3>How do I make it so I can't change my array by accident?</h3>
<p>Use the <code>final</code>keyword in the method header</p>
<pre><code class="language-java">public static void printAll(final int[] array) {
for (int i = 0; i &lt; array.length; i++) {
System.out.println("Number " + (i + 1) + " is " + array[i])
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,107 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture Notes October 9th</h1>
<h2>Arrays (Cont.)</h2>
<h3>Another way of Array Initialization</h3>
<pre><code class="language-java">String[] names = {"Jennifer", "Noodle", "Fluffy", "Rosie", "Cinnamon", "Brianne", "Oliver"}</code></pre>
<p>Everything between the <code>{}</code> is the initial values in the names array in the order that it is written.</p>
<p>Recall that arrays are of a fixed size. The <code>names</code> array above has 7 elements.</p>
<h3>What can I do if I want to add something to the names array?</h3>
<p>Do the following steps:</p>
<ol>
<li>Create an empty array with the same size as the array</li>
<li>Take all the contents in the array and store it in a temporary array</li>
<li>Set names equal to another array of a bigger size</li>
<li>Take all the contents in temp and store it back to the array of choice</li>
<li>Add an element to the array by index</li>
</ol>
<pre><code class="language-java">// (1)
String[] temp = new String[7];
// (2)
temp.clone(names);
// (3)
names = new String[20]; // Now it can hold up to 20 names
// (4)
names.clone(temp);
// (5)
names[7] = "New name!";</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,239 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>CPSC 220 Lecture 4</h1>
<h2>Practice Problem</h2>
<ol>
<li>Create a class called Car</li>
<li>
<ul>
<li>Create a private variable of int type called year</li>
<li>Create a private variable of String type called make</li>
</ul>
</li>
<li>Create accessor methods for all data members.</li>
<li>Create mutator methods for all data methods.</li>
</ol>
<pre><code class="language-java">public class car { // begin car
private int year;
private String make;
public int getYear(void) {
return year;
}
public String getMake() {
return make;
}
public void setYear(int y) {
if (y &gt; 1890) {
year = y;
} else {
System.out.println(y + " is not a valid year.");
}
}
public void setMake(String m) {
make = m;
}
}</code></pre>
<p>Local variables are only existent within the curly braces that it is defined in.</p>
<h2>If Statements and Boolean Expressions</h2>
<p>Boolean expressions return a boolean</p>
<pre><code class="language-java">1 &lt; 4; // 1 is less than 4: TRUE
3 &gt; 5; // 3 is greater than 5: FALSE
5 == 5; // 5 is equal to 5: TRUE
5 != 5; // 5 is not equal to 5: FALSE
1 &gt;= 1; // 1 is greater than or equal to 1: TRUE
5 &lt;= 1; // 5 is less than or equal to 1: FALSE</code></pre>
<p>If statements only occur if the boolean expression is true, otherwise the <code>else</code> block is executed.</p>
<pre><code class="language-java">if (true) {
System.out.println("I am always printed");
} else {
System.out.println("I am never printed");
}</code></pre>
<p>You can only have one <code>else</code> per <code>if</code>. If you have an <code>if</code> you don't necessarily need an <code>else</code></p>
<h2>Local vs Class Variables</h2>
<p>If you have a local variable and the class variable sharing the same name, then the local variable is always used first.</p>
<pre><code class="language-java">public class car { // begin car
private int year;
public void setYear(int year) {
year = year;
}
}</code></pre>
<p>This is a redundant statement, it makes the argument that is passed in equal to itself.</p>
<p>To avoid this situation, use the keyword <code>this</code> to access the class variable</p>
<pre><code class="language-java">public class car {
private int year;  
public void setYear(int year) {    
this.year = year;
}
}</code></pre>
<p>The code above runs as expected.</p>
<p>Rewriting our class with <code>this</code></p>
<pre><code class="language-java">public class car { // begin car
private int year;
private String make;
public int getYear(void) {
return year;
}
public String getMake() {
return make;
}
public void setYear(int year) {
if (y &gt; 1890) {
this.year = year;
} else {
System.out.println(y + " is not a valid year.");
}
}
public void setMake(String make) {
this.make = make;
}
}</code></pre>
<h2>Unreachable Code</h2>
<p>When the code hits a <code>return</code> statement, it stops executing the rest of the code in the method. Also throws an Unreachable Code Error.</p>
<pre><code class="language-java">public int add(int x, int y) {
return x + y;
System.out.println("x + y = " + x + y);
}
add();
System.out.println("Hello");</code></pre>
<p>Here the code above will not compile, though assuming the error doesn't exist then it would only print out &quot;Hello&quot;</p>
<h2>Constructors</h2>
<p>You cannot have a private or protected constructors.</p>
<p>Constructors are used to initialize your objects.</p>
<p>You want to have the class variables to the left of the assignment statement.</p>
<pre><code class="language-java">public class car {
private int year;
private String make;
car() {
year = 1890;
make = "Ford";
}
car(int year, String make) {
this.year = year;
this.make = make;
}
}</code></pre>
<h2>Testers</h2>
<p>Testers are useful to check that the class is implemented correctly. Both the tester and the class have to be in the same folder/directory.</p>
<pre><code class="language-java">public class carTester {
public static void main(String[] args) {
Car myCar; // Declaration
myCar = new Car(); // Initilization
Car yourCar = new Car(2009, "Hyundai"); // Declaration + Initialization
}
}</code></pre>
<h2>More about classes</h2>
<pre><code class="language-java">public class Car {
private String name;
private int odometer;
public void setOdometer(int od) {
odometer = od;
}
public void setName(String n) {
this.name = n;
}
public void changeOilRequest(String name, int od) {
if (name == this.name) {
int difference = od - this.odometer;
if (difference &gt; = 3000) {
// You can call other methods in the class
setOdo(od); // Equivalent to "this.setOdo(od);"
this.odometer = od;
System.out.println("Ready for oil change.");
} else {
System.out.println(name + " not ready for oil change.")
}
} // end if
} // end changeOil request
} // end class</code></pre>
<p>To call public methods outside the class use the variable name to do so.</p>
<pre><code class="language-java">public class CarTester {
public static void main(String[] args) {
Car myCar = new Car();
myCar.setName("Honda")
myCar.changeOilRequest("Honda", 3400);
}
}</code></pre>
<h2>Math library</h2>
<p>The <code>ceil</code> method rounds up while the <code>floor</code> method runs down.</p>
<pre><code class="language-java">Math.ceil(3.2); // 4
Math.ceil(4.1); // 4</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,190 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h2>Counting Loop</h2>
<p>Looking at the following example code</p>
<pre><code class="language-java">int i;
for (i = 0; i &lt; 3; i++) { //begin for
System.out.println("i = " + i); //body
} //end for
System.out.println("After loop, i = " + i);</code></pre>
<p><code>i = 0</code> is the initializing statement</p>
<p><code>i &lt; 3</code> is the conditional, that is when the loop ends</p>
<p><code>i++</code> is the increment/decrement</p>
<p><code>i++</code> is synonymous with <code>i = i + 1</code></p>
<p>The initialization statement only occurs once at the beginning of the loop. </p>
<h3>Execution Example</h3>
<p>Let us go through this for loop example</p>
<ul>
<li>Let us set <code>i = 0</code></li>
<li>Is <code>i &lt; 3</code>? Yes execute the body
<ul>
<li>The body executes an output of <code>"i = 0"</code></li>
</ul></li>
<li>Now we increment <code>i ++</code>, i is now 1</li>
<li>Is <code>i &lt; 3</code>? Yes, 1 is less than 3. Execute body
<ul>
<li>The computer prints out <code>"i = 1"</code></li>
</ul></li>
<li>Increment <code>i++</code> i is now 2</li>
<li>Is <code>i &lt; 3</code>? Yes 2 is less than 3. Execute body
<ul>
<li>The computer prints out <code>"i = 2"</code></li>
</ul></li>
<li>Increment <code>i++</code>, i is now 3</li>
<li>Is <code>i &lt; 3</code>? No 3 is not less than 3
<ul>
<li>Don't execute body of loop</li>
</ul></li>
</ul>
<p>Exit loop. Print <code>"After loop, i = 3"</code></p>
<h3>Condensing Syntax</h3>
<p>You can also do the declaration in the initialization statement</p>
<pre><code class="language-java">for (int i = 0; i &lt; 3; i++) {
System.out.println("i = " + i);
}</code></pre>
<p>This now runs like above without the <code>"After loop, i = 3"</code> print. You cannot access the variable <code>i</code> outside the for loop since in this example, it belongs to the for loop's scope.</p>
<h2>Logic Expressions</h2>
<h3>And Statements</h3>
<p>With the AND operator <code>&amp;&amp;</code> both the left and right side needs to be true for the expression to be true.</p>
<pre><code class="language-java">true &amp;&amp; true // true
true &amp;&amp; false // false
false &amp;&amp; true // false
false &amp;&amp; false // false</code></pre>
<h3>Or Statements</h3>
<p>With the OR operator <code>||</code> either the left or right side needs to be true for the expression to be true.</p>
<pre><code class="language-java">true || true // true
true || false // true
false || true // true
false || false // false</code></pre>
<h3>Examples</h3>
<p><strong>Example</strong>: Print out the number <code>n</code> if it is between 10 and 20 (inclusive)</p>
<pre><code class="language-java">if (n &gt;= 10 &amp;&amp; n &lt;= 20) {
System.out.println(n);
}</code></pre>
<p><strong>Example</strong>: Print out the <code>age</code> if it is not of young adult age. Young adult range is from 18 to 39 (inclusive)</p>
<pre><code class="language-java">if (!(age &gt;= 18 &amp;&amp; age &lt;= 39)) {
System.out.println(age);
}</code></pre>
<p>Or you can use De Morgan's Law (for the curious)</p>
<pre><code class="language-java">if (age &lt; 18 || age &gt; 39) {
System.out.println(age);
}</code></pre>
<h2>For Loops (Cont.)</h2>
<h3>Backwards counting</h3>
<p>You can use the loop to count backwards</p>
<pre><code class="language-java">for (int i = 10; i &gt; -1; i--) {
System.out.println(i);
}</code></pre>
<p>This prints the following</p>
<pre><code class="language-java">10
9
8
7
6
5
4
3
2
0</code></pre>
<h3>Rows-Columns</h3>
<p>You can make rows and columns of asterisks</p>
<pre><code class="language-java">for (int j = 0; j &lt; someNumber; j++) { // Corresponds to rows
for (int i = 0; i &lt; someNumber2; i++) { // Corresponds to columns
System.out.print("*");
}
System.out.println(""); // Goes to the next row
}</code></pre>
<p>If <code>someNumber</code> equals <code>someNumber2</code>, then we have the same amount of rows as columns.</p>
<p>Let <code>someNumber</code> equal to 2 and <code>someNumber2</code> equal to 2</p>
<p>Output:</p>
<pre><code>**
**</code></pre>
<h3>Right Triangles</h3>
<p>You can make a right triangle of Tilda with the following code</p>
<pre><code class="language-java">for (int i = 1; i &lt;= num; i++) { // Corresponds to the row
for (int j = 0; j &lt; i; j++) { // Corresponds to the column and stops at the current row number
System.out.print("~");
}
System.out.println(""); // Moves to next row
}</code></pre>
<h5>What are for-loops used for? <em>Reusing code</em></h5>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,136 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture in CPSC 220 Sept 25 2017</h1>
<h2>Constants</h2>
<p>Adding the keyword <code>final</code> in front of a variable declaration makes the variable constant. Meaning you cannot later change it in the code.</p>
<pre><code class="language-java">final int MAX = 10;</code></pre>
<p>By industry norm, the style of the variable is to have it in all caps.</p>
<p>You CANNOT do the following</p>
<pre><code class="language-java">final int MAX = 10;
MAX = 15;</code></pre>
<h2>Using Libraries</h2>
<ol>
<li>Import the library</li>
<li>Find the method that is appropriate</li>
<li>Use it</li>
</ol>
<p>Example:</p>
<pre><code class="language-java">import java.util.Math;
public class MathTest {
public static void main(String[] args) {
double answer = Math.ceil(5.4);
System.out.println(Math.ceil(4.5));
}
}</code></pre>
<h2>Typecasting / Type Conversions</h2>
<p>You can only cast a variable if you are casting it to a type that is larger than the one it was previously. The expression Polack used is that you cannot fit into super skinny jeans, but you can fit into bigger pants.</p>
<pre><code class="language-java">double dnum;
float fnum;
int inum;
dnum = (double)fnum * (double)inum;</code></pre>
<h2>Char vs String</h2>
<p><code>String</code>s are initialized in Java with double quotes while <code>char</code>s are initialized with single quotes</p>
<pre><code class="language-java">char initial = 'j';
String name = "Jennifer";</code></pre>
<p>Characters can be read in as an integer.</p>
<h2>Random Numbers</h2>
<ol>
<li>Import <code>java.util.Random;</code></li>
<li>Declare the random number generator</li>
<li>Initialize with <code>new</code></li>
<li>Use it</li>
</ol>
<pre><code class="language-java">import java.util.Random;
public class RandTest {
public static void main(String[] args) {
Random rand;
rand = new Random();
int number = rand.nextInt(100); // Random generates number between 0-99
}
}</code></pre>
<p>How do you generate numbers in a different range? [50, 150]</p>
<pre><code class="language-java">rand.nextInt(100); // 0-99
rand.nextInt(101) // 0 - 100
rand.nextInt(101) + 50 //50-150</code></pre>
<p>In more general terms</p>
<pre><code class="language-java">rand.nextInt(end - start + 1) + start</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,297 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>CPSC 220 Lecture 3</h1>
<h2>Variables</h2>
<p>Variable -- Storage of information</p>
<p>The type cannot change in a variable.</p>
<p>Examples of types include</p>
<ul>
<li>int</li>
<li>float</li>
<li>double</li>
<li>String</li>
<li>char</li>
<li>boolean</li>
</ul>
<p>Declaration: <code>int num;</code></p>
<p>Initialization: <code>num = 5;</code></p>
<p>Declaration + Initialization: <code>int num = 5;</code></p>
<h3>Possible Errors</h3>
<p><strong>You cannot declare a variable multiple times.</strong></p>
<p>Undefined variables are when you do not declare a variable before attempting to use it.</p>
<h3>Casting</h3>
<p>You need to cast if you are attempting to lose data or store a larger memory type into a smaller one.</p>
<p>double -&gt; float -&gt; int <strong>(casting required)</strong></p>
<pre><code class="language-java">double gpa = 3.2;
int num1 = 10 * (int)gpa // 30</code></pre>
<h1>Operations</h1>
<p>The basic number operations are</p>
<ul>
<li>+</li>
<li>-</li>
<li>*</li>
<li>/</li>
<li>% <em>(the remainder)</em></li>
</ul>
<p>Examples</p>
<pre><code class="language-java">0 % 2 // 0
1 % 2 // 1
2 % 2 // 0
3 % 2 // 1
4 % 2 // 0
5 % 2 // 1
3 % 5 // 3
7 % 5 // 2</code></pre>
<p>You can test if something is even using modulus %</p>
<pre><code class="language-java">// Assuming i was initiliazed to a value earlier
if (i % 2 == 0) {
System.out.println("i is even");
} else {
System.out.println("i is odd");
}</code></pre>
<h1>System input</h1>
<p>Here is sample code using a Scanner as input</p>
<pre><code class="language-java">import java.util.Scanner;
public class ScannerExample {
public static void main(String[] args) {
Scanner in;
in = new Scanner(System.in);
// Grab numerical values
int num = in.nextInt();
float gpa = in.nextFloat();
double weight = in.nextDouble();
// Grab a single character
in.nextLine()
char initial = in.next().charAt(0);
// To get the entire line of a string
in.nextLine();
String name = in.nextLine();
}
}</code></pre>
<p>You need to use <code>in.nextLine()</code> to grab the carriage return that is left after grabbing a numeric value.</p>
<h1>Classes and Objects</h1>
<p>Classes are a new type that you can have multiple things of.</p>
<p>These classes are blueprints that are made up of primitives or more basic types.</p>
<p>First create a Pet.java file (Name of the class must match the name of the file)</p>
<pre><code class="language-java">public class Pet {
private String name;
private int years;
}</code></pre>
<p>You can then use the Pet class in your main program. The terminology here is that you can create instances or objects of the class.</p>
<p>In PetTester.java</p>
<pre><code class="language-java">public class PetTester {
public static void main(String[] args) {
Pet myPet;
myPet = new Pet();
}
}</code></pre>
<p><strong>Both Pet.java and PetTester.java must be in the same directory/folder</strong></p>
<h3>Mutators/Accessors</h3>
<p>Since the variables are private we cannot access them in the main program. To work around this, we can write what is called a mutator method.</p>
<pre><code class="language-java">public class Pet {
private String name;
private int years;
// Mutators
public void setName(String n) {
name = n;
}
public void setYears(int y) {
if (y &gt;= 0) {
years = y;
} else {
System.out.println("No one is less than 0 years old.")
}
}
}</code></pre>
<p>Now let's use these new methods</p>
<pre><code class="language-java">public class PetTester {
public static void main(String[] args) {
Pet mypet;
myPet = new Pet();
myPet.setName("Fred");
myPet.setAge(20);
}
}</code></pre>
<p>We need a method that will allow us to access the data type. Let's add accessors to our pet class.</p>
<pre><code class="language-java">public class Pet {
private String name;
private int years;
// Mutators
public void setName(String n) {
name = n;
}
public void setYears(int y) {
if (y &gt;= 0) {
years = y;
} else {
System.out.println("No one is less than 0 years old.")
}
}
// Accessors
public String getName() {
return name;
}
public int getYears() {
return years;
}
}</code></pre>
<p>Now let's get some information from the pet object, such as the age.</p>
<pre><code class="language-java">public class PetTester {
public static void main(String[] args) {
Pet mypet;
myPet = new Pet();
myPet.setName("Fred");
myPet.setYears(20);
int year = myPet.getYears();
}
}</code></pre>
<h3>Constructors</h3>
<p>Constructors lets us initialize variables in the class without having to use mutator methods.</p>
<pre><code class="language-java">public class Pet {
private String name;
private int years;
// Default Constructor
public Pet() {
name = "";
years = 0;
}
// Non-Default Constructor
public Pet(int y, String n) {
name = n;
years = y;
}
// Mutator Methods
public void setName(String n) {
name = n;
}
public void setYears(int y) {
if (y &gt;= 0) {
years = y;
} else {
System.out.println("No one is less than 0 years old.")
}
}
// Accessor Methods
public String getName() {
return name;
}
public int getYears() {
return years;
}
}</code></pre>
<p>Now let us see this in action.</p>
<pre><code class="language-java">public class PetTester {
public static void main(String[] args) {
Pet yourPet = new Pet(10, "Fluffy");
}
}</code></pre>
<p>You can have as many constructors as you want, but they must be different.</p>
<p>Example:</p>
<pre><code class="language-java">public class Pet {
...
pet() {
name = "";
years = 0;
}
pet(int y, String n) {
name = n;
years = y;
}
pet(String n) {
years = 1;
name = n;
}
...
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,102 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>CPSC 220 Computer Programming and Problem Solving Fall 2017</h1>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fsept6.html">Lecture 3 -- September 6</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fsept11.html">Lecture 4 -- September 11</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fsept20.html">Lecture 7 -- September 20</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fsept25.html">Lecture 9 -- September 25</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct2.html">Lecture 10 -- October 2</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct4.html">Lecture 11 -- October 4</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct9.html">Lecture 12 -- October 9</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct11.html">Lecture 13 -- October 11</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct18.html">Lecture 15 -- October 18</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct23.html">Lecture 16 -- October 23</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct25.html">Lecture 17 -- October 25</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Foct30.html">Lecture 18 -- October 30</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fexam2review.html">Exam 2 Review -- October 30</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fnov6.html">Lecture 19 -- November 6</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fnov13.html">Lecture 20 -- November 13</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fnov15.html">Lecture 21 -- November 15</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fnov20.html">Lecture 22 -- November 20</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fnov27.html">Lecture 23 -- November 27</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220%252Fdec6.html">Lecture 25 -- December 6</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,105 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for April 3rd</h1>
<h2>Inheritance</h2>
<p>The <em>base class</em>, <em>super class</em>, or <em>parent class</em> is the initial class that we are working with. Let's say that you want to <em>extend</em> the class, or add additional functionality. The class that inherits from the parent class is called the <em>child class</em>, <em>subclass</em> or <em>derived class</em>.</p>
<h2>Child Class Syntax</h2>
<pre><code class="language-java">public class Truck extends Car {
// Truck Appropriate Fields
// Necessary methods for truck
}</code></pre>
<p>This code adds all the methods from Car into the Truck class. You can then add methods that is specific to a Truck into the Truck class.</p>
<p>A child class has all parent fields and access to all parent methods!</p>
<h2>Visibility Modifiers</h2>
<p>Recall the words <code>public</code> and <code>private</code></p>
<p>The <code>public</code> modifier makes the field/method accessible by any class</p>
<p>The <code>private</code> modifier makes the field/method only accessible within the method itself</p>
<p>The protected modifier makes the field/method accessible within the same class or any subclasses.</p>
<h2>Overriding a Method</h2>
<p>You can override a parent class method by declaring a method in the child class with the same...</p>
<ul>
<li>name</li>
<li>number of paramters</li>
<li>parameter types</li>
</ul>
<p>but this method would have different behavior!</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,120 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for February 1st</h1>
<h2>Control Structures</h2>
<p>In this class we will talk about three types of control structures</p>
<ul>
<li>Sequential</li>
<li>Selection</li>
<li>Repetition</li>
</ul>
<p>Sequential is what is most familiar to us. Write the lines from top to bottom and it executes it in that order</p>
<h3>Selection</h3>
<p>Selection depends on the question of <code>if</code>. </p>
<p>If it is raining, wear boots</p>
<pre><code class="language-java">if (raining) {
wearingBoots = true;
}</code></pre>
<p>If you want something to happen also when it is not true, consider an <code>if-else</code> statement</p>
<p>If the light is off, turn it on.</p>
<p>Otherwise, turn it on</p>
<pre><code class="language-java">if (lightIsOn) {
lightIsOn = false;
} else {
lightIsOn = true;
}</code></pre>
<p>Sometimes you can have multiple branches depending on a condition. Let us take a stop light as an example</p>
<pre><code class="language-java">if (light == "red") {
car.stop()
} else if (light == "yellow") {
car.slow()
} else {
car.go()
}</code></pre>
<h2>String comparison</h2>
<p>There is a specific method in the <code>String</code> class when it comes to checking for string equality</p>
<pre><code class="language-java">boolean equals(String s)</code></pre>
<p>Let us look at an example</p>
<pre><code class="language-java">String word = "hello";
boolean ans = word.equals("hello"); // Returns true
boolean ans2 = word.equals("Hello"); // Returns false</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,130 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for February 13</h1>
<h2>Loops</h2>
<h3>Why Loops?</h3>
<p>While some check is true, repeat the work.</p>
<p>While the cookies aren't baked, keep baking</p>
<h3>Loop Building Process</h3>
<ol>
<li>Identify one test that must be true when the loop is finished</li>
<li>Use the <strong>opposite</strong> form of the test</li>
<li>Within loop, make <em>progress</em> towards completion of the goal.</li>
</ol>
<h3>While syntax</h3>
<pre><code class="language-java">while (expression) {
// Loop body executes if expression is true
}
// Statements execute after expression is false</code></pre>
<h3>Getting Input (Songs in a Playlist Psuedocode)</h3>
<pre><code class="language-java">// Ask user about first song
while (user says play next song) {
// play next song
// ask user about next song
}</code></pre>
<h3>Nested Loops</h3>
<p>You can have loops inside loops</p>
<pre><code class="language-java">int outer = 1;
while (outer &lt; 4) {
int inner = 1;
while (inner &lt; 4) {
System.out.println(outer + ":" + inner);
inner++;
}
outer++;
}</code></pre>
<p>This code does the following</p>
<pre><code class="language-reStructuredText">1:1
1:2
1:3
2:1
2:2
2:3
3:1
3:2
3:3</code></pre>
<h3>Break Down the Problem</h3>
<p>Never write the entire program at once! This makes it incredibly hard to debug. Instead break it into small parts.</p>
<p>Write one part -&gt; debug until it works</p>
<p>Write second part -&gt; debug until it works</p>
<p>This way you know which part of your code failed, instead of having everything fail.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,131 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for February 20th</h1>
<h2>Reading a File</h2>
<p>You can get input from a file instead of from the terminal</p>
<pre><code class="language-java">FileInputStream fileIn = new FileInputStream("myFile.txt");
// Our familiar Scanner
Scanner scnr = new Scanner(fileIn);
// We can use our usual Scanner methods
String line = scnr.nextLine();
fileIn.close(); // Remember to close the file when you're finished with it!</code></pre>
<h3>Reviewing Scanner Methods</h3>
<p>To understand some of the Scanner methods we need to be aware of the &quot;newline&quot; character. This character is equivalent to the <code>Enter</code> button on the keyboard.</p>
<p><code>scnr.nextLine()</code> This get's all the characters in the buffer up to the newline character.</p>
<p><code>scnr.next()</code> Grabs the characters in the next &quot;token&quot;. Tokens are usually separated by any whitespace type character (spaces, enters, tabs, etc.)</p>
<h2>Writing to a File</h2>
<p>Prints information to a file instead of to the screen</p>
<pre><code class="language-java">FileOutputStream fileOut = new FileOutputStream("myOutfile.txt");
PrintWriter out = new PrintWriter(fileOut);
out.println("Print this as the first line.");
out.flush(); // Pushes the file changes to the file
fileOut.close(); // If you forget this then it won't remember your changes</code></pre>
<h2>Arrays</h2>
<p>Arrays are containers of fixed size. It contains a fixed number of values of the <strong>same type</strong>. (Ex: 10 integers, 2 strings, 5 booleans)</p>
<p>Declaration</p>
<pre><code class="language-java">int[] array; // This declares an integer array</code></pre>
<p>Initialization</p>
<pre><code class="language-java">array = new int[7]; // This states that this array can hold up to 7 integers</code></pre>
<p>Storing a value in an array</p>
<ul>
<li>Square bracket notation is used</li>
</ul>
<pre><code class="language-java">int[] array = new int[7];
array[0] = 5; // Stores 5 into the first slot</code></pre>
<p>Now let us attempt to retrieve the value</p>
<pre><code class="language-java">int temp = array[0];
System.out.println(temp); // Prints "5"</code></pre>
<h3>Traversing an Array</h3>
<p>Let's say we have the following array</p>
<pre><code class="language-java">int[] numbers = {3, 5, 2, 7, 9};</code></pre>
<p>Let's print out each of the values in the array</p>
<pre><code class="language-java">for (int i = 0; i &lt; numbers.length; i++) {
System.out.print("value in " + i " is " + numbers[i]);
}</code></pre>
<h3>Finding the maximum value in an Array</h3>
<pre><code class="language-java">int highest = numbers[0];
for (int i = 0; i &lt; numbers.length; i++) {
if (numbers[i] &gt; highest) {
highest = numbers[x];
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,130 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for February 27th</h1>
<h2>Review for midterm</h2>
<p>Chapter 1 -- Code Style, API</p>
<p>Chapter 2 -- Variables &amp; Assignments, strings</p>
<p>Chapter 3 -- input &amp; output</p>
<p>Chapter 4 -- branches (if, if/else, switch)</p>
<p>Chapter 5 -- loops (while, for), scope</p>
<p>Chapter 6 -- File Reading and Writing</p>
<h2>Separated vs Connected Branches</h2>
<p>What is the output of this code?</p>
<pre><code class="language-java">String preferredLanguage = "Spanish";
if (preferredLanguage.equals("Chinese")) {
System.out.println("Ni hao!");
}
if (preferredLanguage.equals("Spanish")) {
System.out.println("Hola!");
}
if (preferredLanguage.equals("French")) {
System.out.println("Bonjour!");
}
if (preferredLanguage.equals("German")) {
System.out.println("Gutentag!")
} else {
System.out.println("Hello!")
}</code></pre>
<p>The output is</p>
<pre><code class="language-reStructuredText">Hola!
Hello!</code></pre>
<p>This is because each of the if statements are independent from each other. Whether or not the if statement gets check is not affected by the if statements around it.</p>
<p>Since the preferred language equals Spanish it outputs <code>Hola!</code> But since the language is also <em>not German</em> it prints out <code>Hello!</code> as well.</p>
<h2>Using an Array</h2>
<p>Square brackets notation is used to access elements, array slots can be used as variables</p>
<pre><code class="language-java">int[] array = new int[7]; // Creates an integer array of size 7
array[0] = 5;</code></pre>
<h2>Swapping Elements</h2>
<p>You can swap <code>x</code> and <code>y</code> in the following way with a <em>temporary</em> variable</p>
<pre><code class="language-java">int x = 6;
int y = 1;
int temp = x;
x = y;
y = temp;</code></pre>
<h2>Two-Dimensional Arrays</h2>
<pre><code class="language-java">// Creates a 2D array of two rows and three columns
int[][] a = new int[2][3]</code></pre>
<p>You can access an element of this 2D array using the conventional square bracket notation</p>
<pre><code class="language-java">a[0][0] = 5;</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,475 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for February 6th</h1>
<h2>If Statements -- Cont.</h2>
<p>Inside the parenthesis of the <code>if</code> statement must be a boolean expression. This is an expression that evaluates to either <code>true</code> or <code>false</code>. We can do more complex boolean expressions through logical operators.</p>
<h2>Logical Operators</h2>
<p>NOT <code>!a</code> this is true when <code>a</code> is false</p>
<p>AND <code>a &amp;&amp; b</code> this is true when both operands are true</p>
<p>OR <code>a || b</code> this is true when either a is true OR b is true</p>
<h2>Truth Tables</h2>
<ul>
<li>Show all possible outcomes</li>
<li>It breaks the expression down into parts</li>
</ul>
<h3>Not</h3>
<p>Let's look at the most simplest case. Not.</p>
<table>
<thead>
<tr>
<th>a</th>
<th>!a</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
</tr>
</tbody>
</table>
<h3>AND</h3>
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
<th>a &amp;&amp; b</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>false</td>
</tr>
</tbody>
</table>
<p>Notice here that <code>a &amp;&amp; b</code> is only true when both <code>a</code> and <code>b</code> are true.</p>
<h3>OR</h3>
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
<th>a \</th>
<th>\</th>
<th>b</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>true</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>false</td>
</tr>
</tbody>
</table>
<p>Notice here that <code>a || b</code> is only false when both <code>a</code> and <code>b</code> are false.</p>
<h2>Precedence (Order of Operations)</h2>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Parenthesis</td>
<td><code>()</code></td>
</tr>
<tr>
<td>Logical Not</td>
<td><code>!</code></td>
</tr>
<tr>
<td>Arithmetic Operators</td>
<td><code>*</code> <code>/</code> <code>%</code> <code>+</code> <code>-</code></td>
</tr>
<tr>
<td>Relational Operators</td>
<td><code>&lt;</code> <code>&lt;=</code> <code>&gt;</code> <code>&gt;=</code></td>
</tr>
<tr>
<td>Equality and Inequality operators</td>
<td><code>==</code> <code>!=</code></td>
</tr>
<tr>
<td>Logical AND</td>
<td><code>&amp;&amp;</code></td>
</tr>
<tr>
<td>Logical OR</td>
<td><code>||</code></td>
</tr>
</tbody>
</table>
<h2>Playing with Truth Tables Example</h2>
<h3>a &amp;&amp; !b</h3>
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
<th>!b</th>
<th>a &amp;&amp; !b</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>true</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>true</td>
<td>false</td>
</tr>
</tbody>
</table>
<h3>!a || b</h3>
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
<th>!a</th>
<th>!a \</th>
<th>\</th>
<th>b</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>true</td>
<td>false</td>
<td>true</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>true</td>
<td>true</td>
</tr>
</tbody>
</table>
<h3>!(a || b &amp;&amp; c)</h3>
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
<th>c</th>
<th>b &amp;&amp; c</th>
<th>a \</th>
<th>\</th>
<th>(b &amp;&amp; c)</th>
<th>!(a \</th>
<th>\</th>
<th>b &amp;&amp; c)</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>true</td>
<td>true</td>
<td>false</td>
<td>false</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>true</td>
<td>false</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>true</td>
<td>true</td>
<td>false</td>
<td>false</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>true</td>
<td>false</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>false</td>
<td>false</td>
<td>false</td>
<td>true</td>
</tr>
</tbody>
</table>
<h3>!a || b &amp;&amp; c</h3>
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
<th>c</th>
<th>!a</th>
<th>b &amp;&amp; c</th>
<th>!a \</th>
<th>\</th>
<th>b &amp;&amp; c</th>
</tr>
</thead>
<tbody>
<tr>
<td>true</td>
<td>true</td>
<td>true</td>
<td>false</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>true</td>
<td>true</td>
<td>false</td>
<td>false</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>true</td>
<td>false</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>true</td>
<td>true</td>
</tr>
<tr>
<td>true</td>
<td>false</td>
<td>false</td>
<td>false</td>
<td>false</td>
<td>false</td>
</tr>
<tr>
<td>false</td>
<td>true</td>
<td>false</td>
<td>true</td>
<td>false</td>
<td>true</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>true</td>
<td>true</td>
<td>false</td>
<td>true</td>
</tr>
<tr>
<td>false</td>
<td>false</td>
<td>false</td>
<td>true</td>
<td>false</td>
<td>true</td>
</tr>
</tbody>
</table>
<h2>Distributive Property of Logical Operators</h2>
<p>The following statements are equivalent</p>
<p><code>!(a &amp;&amp; b)</code> is equivalent to <code>!a || !b</code></p>
<p>Notice how when you distribute the <code>!</code> you have to flip the operand as well. <code>&amp;&amp;</code> becomes <code>||</code></p>
<p>Same is true for the following example</p>
<p><code>!(a || b)</code> is equivalent to <code>!a &amp;&amp; !b</code></p>
<p><code>!(a || b &amp;&amp; c)</code> is equivalent to <code>!a &amp;&amp; (!b || !c)</code></p>
<h2>Short Circuit Evaluation</h2>
<p>In an <code>&amp;&amp;</code> (AND) statement, if the left side is <code>false</code>, there is no need to evaluate the right side. Since it's going to be false anyways!!</p>
<pre><code class="language-java">false &amp;&amp; true; // FALSE no matter what the right side is</code></pre>
<p>In an <code>||</code> (OR) statement, if the left side is `true, there is no need to evaluate the right side. Since it's going to be true by default!!</p>
<pre><code class="language-java">true || false; // TRUE no matter what the right side is</code></pre>
<p>Java takes this shortcut by default for efficiency reasons</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,149 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for February 8th</h1>
<h2>Switch Statements</h2>
<p>Another way to perform multiway branching. Comparing a variable and constant values (<code>String</code>, <code>int</code>, <code>char</code>)</p>
<p>Switch statements cannot be used with <code>boolean</code>, <code>double</code>, or <code>float</code>s</p>
<h3>Syntax</h3>
<pre><code class="language-java">switch (variable) {
case value1:
// Do something
break;
case value2:
// Do something else
break;
//...
default:
// If all else fails do this
break;
}</code></pre>
<p><code>case</code> is a reserved word that means &quot;when our variable in consideration is equal to...&quot;</p>
<p>If you forget the <code>break</code> keyword, then the program will keep doing the work of all the statements until it hits a <code>break</code> keyword.</p>
<h3>Example Switch Syntax</h3>
<pre><code class="language-java">switch (birthday) {
case 1:
birthstone = "garnet";
break;
case 2:
birthstone = "amethyst";
break;
// ....
default:
System.out.println("Not valid");
break;
}</code></pre>
<h2>Comparing Strings Relationally</h2>
<p>Comparing strings are based on the ASCII value of characters</p>
<p>Sorting strings will result in strings being in alphabetical or reverse alphabetical order. The values of the strings are compared character by character from the left with each ASCII value.</p>
<p>To compare strings use the <code>compareTo()</code> method. Here is the format of the call</p>
<pre><code class="language-java">str1.compareTo(str2)</code></pre>
<p>This returns a <em>negative number</em> when <code>str1</code> is less than <code>str2</code></p>
<p>This returns <code>0</code> when <code>str1</code> is equal to <code>str1</code></p>
<p>This returns a <em>positive number</em> when <code>str1</code> is greater than <code>str2</code></p>
<h3>Example</h3>
<pre><code class="language-java">String a = "apple";
String b = "banana";
int x = a.compareTo(b); // x = -1
int y = b.compareTo(a); // y = 1</code></pre>
<h2>Ternary Operator</h2>
<p>With a ternary operator, you can shorten statements where a value is determined by an if statement</p>
<pre><code class="language-java">String output = "";
if (movieRating &gt; 4) {
output = "Fan favorite";
} else {
output = "Alright";
}</code></pre>
<p>Is equivalent to</p>
<pre><code class="language-java">String output = "";
output = (movieRating &gt; 4)? "Fan favorite": "Alright";</code></pre>
<h3>Another Example</h3>
<pre><code class="language-java">double shipping;
if (isPrimeMember) {
shipping = 0;
} else {
shipping = 3.99;
}</code></pre>
<p>Is equivalent to</p>
<pre><code class="language-java">double shipping = (isPrimeMember)? 0: 3.99;</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,111 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for January 16 2018</h1>
<h2>Comments</h2>
<p>You can use multi-line comments or single line comments to note about a piece of code. This helps you so that you don't forget what was happening in the code</p>
<pre><code class="language-java">/* Multi Line Comment
I am multiple lines woo!
*/
System.out.println("Hello"); // I am an inline comment</code></pre>
<h3>Javadocs</h3>
<p>This is a standardized method in Java to describe your functions and classes</p>
<pre><code class="language-java">/**
* This is a Javadoc comment. A description of your class should appear here
* @author Brandon
* @version 2018
*/
public class Example{
/** Another Javadoc comment.
* A description of your program should go here
*/
public static void main(String[] args) {
}
}</code></pre>
<h2>Variables</h2>
<p>Convention in Java is for all of your functions/method names to be lowercase.</p>
<p>Class names should be title cased, each word is capitalized ex: <code>IntroExample</code></p>
<h2>Java API</h2>
<p>The Java API is publicly accessible and is extremely useful for finding methods in existing classes.</p>
<p>The API documentation for the <code>String</code> class is located here: <a href="https://docs.oracle.com/javase/8/docs/api/index.html?java/lang/String.html">https://docs.oracle.com/javase/8/docs/api/index.html?java/lang/String.html</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,222 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for January 18</h1>
<h2>Variables and Assignment</h2>
<p>Think about variables as buckets that hold information. Once the bucket is created, only one type of item can go in the bucket.</p>
<pre><code class="language-java">sand bucket1;</code></pre>
<p>We can say that bucket1 is of type <code>sand</code>, that means the only thing that can go in the bucket is sand.</p>
<pre><code class="language-java">int bucket1;
double bucket2;</code></pre>
<p>From the two lines above, we have <em>declared</em> the variable.</p>
<p>Variables store state, they are a name for a location in memory. </p>
<p>Always remember to initialize your variables. Otherwise there's nothing in the bucket!</p>
<pre><code class="language-java">bucket1 = 5;</code></pre>
<p>You can combine both the declaration and initialization</p>
<pre><code class="language-java">int count = 15;</code></pre>
<p>Remember when dealing with variables to stay true with the type, don't mix a bucket of water with a bucket of sand.</p>
<p>We can update <code>count</code> to contain a true value</p>
<pre><code class="language-java">count = 55;</code></pre>
<p><code>count</code> no longer has the value of <code>15</code> in it. There's no record of it! It has been overwritten with the value <code>55</code></p>
<h3>Primitive Types</h3>
<p>There are 8 primitive types in Java</p>
<ul>
<li>boolean</li>
<li>char</li>
<li>byte</li>
<li>short</li>
<li>int</li>
<li>long</li>
<li>float</li>
<li>double</li>
</ul>
<p>byte through double are all <em>numeric</em> types</p>
<h4>Boolean</h4>
<p><code>boolean</code> can only be equal to <code>true</code> or <code>false</code></p>
<pre><code class="language-java">boolean student = true;</code></pre>
<h4>Char</h4>
<p>Stores a single character from the Unicode set</p>
<p>There are 16 bits per character which adds up to 65,536 characters</p>
<p>ASCII is the US subset of the characters. You can look this up online when needing to deal with ASCII values</p>
<pre><code class="language-java">char firstLetter = 'A';</code></pre>
<h3>Numeric types</h3>
<p>The different numeric types determine the precision of your number. Since numbers are not represented the same in the computer as they are in real life, there are some approximations.</p>
<p>The default type you can use your code is <code>int</code> for integers and <code>double</code> for numbers with a decimal point</p>
<p>There are certain types of operations you can perform on numeric type</p>
<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
<th>Example</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>+</td>
<td>addition</td>
<td>43 + 8</td>
<td>51</td>
</tr>
<tr>
<td>-</td>
<td>subtraction</td>
<td>43.0-8.0</td>
<td>35.0</td>
</tr>
<tr>
<td>*</td>
<td>multiplication</td>
<td>43 * 8</td>
<td>344</td>
</tr>
<tr>
<td>/</td>
<td>division</td>
<td>43.0 / 8.0</td>
<td>5.375</td>
</tr>
<tr>
<td>%</td>
<td>remainder / mod</td>
<td>43 % 8</td>
<td>3</td>
</tr>
<tr>
<td>-</td>
<td>unary minus</td>
<td>-43</td>
<td>-43</td>
</tr>
</tbody>
</table>
<h4>Increment/ Decrement</h4>
<p>There are two types of in/decrementers postfix and prefix</p>
<p>Postfix:</p>
<pre><code class="language-java">int x = 0;
int y = 7;
x++; // Shortcut for x = x + 1
y--; // Shortcut for y = y - 1</code></pre>
<p>Prefix</p>
<pre><code class="language-java">int x = 0, y = 7, z;
z = y * x++; // Equivalent to (y * x) + 1 = 7 * 0
z = y * ++x; // Equivalent to y * (x + 1) = 7 * 1</code></pre>
<h3>Data Conversion</h3>
<p>There are two types of data conversion, implicit and explicit</p>
<p>The compiler can perform implicit data conversion automatically.</p>
<p>Performing an explicit data conversion requires additional work on the programmer's part</p>
<p>A conversion is implicit if you do <strong>not</strong> lose any information in it</p>
<pre><code class="language-java">double price = 6.99;
int sale = 3;
double total = price - sale;</code></pre>
<p>A <em>cast</em> is an explicit data conversion. This is requested by a programmer, this can lead to loss of information</p>
<pre><code class="language-java">int nextChar = 'b';
Character.isAlphabetic( (char) nextChar); // Let's you print the actual letter 'b' instead of the number corresponding to it
float price = 6.99;
int cost = (int) price; // cost is now 6</code></pre>
<h3>Printing variables</h3>
<p>You can print the values of variables using <code>System.out.println</code> and <code>System.out.print</code></p>
<p>The difference is that <code>System.out.println</code> adds a new line at the end. Meaning the next print out will be on the next line.</p>
<pre><code class="language-java">int cost = 5;
double sale = .30;
System.out.print(cost);
System.out.print(sale);
// Prints out '5.30`
System.out.println(cost);
System.out.println(sale);
// Prints out '5'
// Prints out '0.30'</code></pre>
<p>To add a space between two variables in a print, add <code>" "</code> to the expression in between the two variables</p>
<pre><code class="language-java">System.out.println("The total cost is " + 5 " dollars and" + " " + 93 + " cents");
// The total cost is 5 dollars and 94 cents</code></pre>
<h3>Input from User</h3>
<p>You can get import from the user, we can do this using the <code>Scanner</code> class.</p>
<p>First import it at the top of your file</p>
<pre><code class="language-java">import java.util.Scanner;</code></pre>
<p>All you can do with <code>Scanner</code> is outlined in the Java API at this link <a href="https://docs.oracle.com/javase/8/docs/api/index.html?java/util/Scanner.html">https://docs.oracle.com/javase/8/docs/api/index.html?java/util/Scanner.html</a></p>
<p>Create a Scanner object</p>
<pre><code class="language-java">Scanner input = new Scanner(System.in);
System.out.print("Please enter an integer: ");
price = input.nextInt(); // The integer that the user inputs is now stored in price
System.out.println("Your input: " + price);</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,169 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for January 23</h1>
<h2>Java Class</h2>
<p>In Java, your code must live in a class.</p>
<pre><code class="language-java">public class NameOfClass {
public static void main(String[] args) {
// All program code
}
}</code></pre>
<p>It is important that <code>NameOfClass</code> is named meaningfully for the code. It is convention to use CamelCase when using classes. (Capitalize your class names!)</p>
<p>All methods have a method signature, it is unique to it. For main it is the words <code>public static void</code> and the argument <code>String[] args</code>.</p>
<p><code>public</code> means that any other piece of code can reference it.</p>
<p><code>void</code> means that the method returns nothing</p>
<p><code>main</code> is the name of the method. It is important to have <code>main</code> since that tells the Java Interpreter where to start in your program.</p>
<p><code>String[] args</code> is the command line arguments inputted into the program. For this part of the class, we don't need to worry about it.</p>
<p>If you noticed <code>String</code> is a class, it is not a primitive type. This is denoted in Java by having it capitalized.</p>
<h2>Arithmetic Expressions</h2>
<p>There is an order of operations in programming as well. It goes like this:</p>
<ol>
<li>Parenthesis</li>
<li>Unary Operations</li>
<li>*, /, %</li>
<li>+, -</li>
</ol>
<p>And from there you read from left to right.</p>
<h2>Constant Variables</h2>
<p>These are variables that can never be changed</p>
<pre><code class="language-java">final int MINUTES_PER_HOUR = 60</code></pre>
<p>The keyword <code>final</code> indicates to the Java compiler that it is a constant variable.</p>
<p>By convention, constants are in all caps with underscores being separated between the words</p>
<h2>Java Math Library</h2>
<p>There are some arithmetic expressions that we want to be able to do and we cannot achieve that simply with the standard operations</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Math.sqrt(x)</td>
<td>square root</td>
</tr>
<tr>
<td>Math.abs(x)</td>
<td>absolute value</td>
</tr>
<tr>
<td>Math.pow(a, b)</td>
<td>exponentiation $a^b$</td>
</tr>
<tr>
<td>Math.max(a, b)</td>
<td>returns the maximum of a or b</td>
</tr>
<tr>
<td>Math.min(a, b)</td>
<td>returns the minimum of a or b</td>
</tr>
<tr>
<td>Math.round(x)</td>
<td>rounds to the nearest integer</td>
</tr>
</tbody>
</table>
<h2>Example: Finding Areas</h2>
<pre><code class="language-java">public class MoreVariables
public static void main(String[] args) {
// Decrate a variable
int x;
// Initialize ia variable
x = 5;
// Area of a square
int squareArea = x * x;
System.out.println("Area of a square: " + squareArea);
double newSquare = Math.pow(x, 2);
System.out.println("Area of square: " + newSquare);
// Area of Circle
final double PI = 3.14159;
double radius = 3;
double circleArea = radius * radius * PI;
System.out.println("Area of circle: " + circleArea);
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,161 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for January 25</h1>
<h2>Strings</h2>
<p>These are concatenated chars</p>
<pre><code class="language-java">'d' + 'o' + 'g' // equivalent to "dog"</code></pre>
<pre><code class="language-java">"straw" + "berry" // strawberry</code></pre>
<p>Strings are denoted by double quotes <code>""</code> rather than a string which is denoted by single quotes <code>''</code></p>
<p>String is not a primitive type, it is a class. Hence, why it is capitalized in Java.</p>
<p>The <code>java.lang.String</code> is automatically imported in Java.</p>
<p>To declare and initialize a String</p>
<pre><code class="language-java">String name = "Henry";</code></pre>
<p>In memory it appears as</p>
<table>
<thead>
<tr>
<th>H</th>
<th>'e'</th>
<th>'n'</th>
<th>'r'</th>
<th>'y'</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<h3>String Methods</h3>
<pre><code class="language-java">int length()</code></pre>
<pre><code class="language-java">boolean equals(String another)</code></pre>
<pre><code class="language-java">boolean startsWith(String prefix)</code></pre>
<pre><code class="language-java">boolean endsWith(String suffix)</code></pre>
<pre><code class="language-java">String substring(int start, int end)</code></pre>
<pre><code class="language-java">int indexOf(int ch)</code></pre>
<pre><code class="language-java">String toLowerCase()</code></pre>
<h3>Using String Methods</h3>
<pre><code class="language-java">char first = name.charAt(0);</code></pre>
<p>Remember in Java, that it starts counting from zero! If you try to access a letter that doesn't exist, it will produce an <code>IndexOutOfBounds</code> error.</p>
<h2>Errors</h2>
<p>There are two types of errors, compile-type errors and run-time errors. Later we will talk about debugging skills such as making &quot;breakpoints&quot; in your code so you can analyze the different variable values.</p>
<h3>Compile Time Errors</h3>
<p>Compile time errors are generated due to syntax errors. Forgot a semicolon? Missing a brace? </p>
<h3>Run-time Errors</h3>
<p>These are logic errors. Not derived from syntax errors. An example of one that was discussed earlier is the <code>IndexOutOfBounds</code> error.</p>
<h2>Tricky Thing About Input</h2>
<p>Let's talk about input right now. Let's say you have the following scenario</p>
<pre><code class="language-java">Scanner input = new Scanner(System.in);
System.out.println("Enter pet's age: ");
int age = input.nextInt();
System.out.println("Enter pet's name: ");
String name = input.nextLine();
System.out.println("Enter pet's breed: ");
String breed = input.next();</code></pre>
<p>Then when we start to run the program...</p>
<pre><code class="language-reStructuredText">Enter pet's age:
14
Enter pet's name:
Enter pet's breed:
Labradoodle</code></pre>
<p>Why did it skip pet's name? Let's run through the process again</p>
<pre><code class="language-reStructuredText">Enter pet's age:
14 [ENTER]
Enter pet's name:
Enter pet's breed:
Labradoodle</code></pre>
<p>Here the [ENTER] key gets saved into name.</p>
<p>To resolve this, just use an <code>input.nextLine()</code> to throw away that [ENTER]</p>
<pre><code class="language-java">Scanner input = new Scanner(System.in);
System.out.println("Enter pet's age: ");
int age = input.nextInt();
System.out.println("Enter pet's name: ");
input.nextLine();
String name = input.nextLine();
System.out.println("Enter pet's breed: ");
String breed = input.next();</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,122 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for January 30</h1>
<h2>Random Number Generator</h2>
<p>One of the ways you can do a random number generator is through this method:</p>
<p>Import a class called random </p>
<pre><code class="language-java">import java.util.Random;</code></pre>
<p>Then you need to create a <code>Random</code> object</p>
<pre><code class="language-java">Random rand = new Random();</code></pre>
<p>After this you can call the <code>nextInt()</code> method to get a random number between 0 and $2^{32}$</p>
<pre><code class="language-java">int randInt = rand.nextInt();</code></pre>
<p>If you don't want a random number between 0 and $2^{32}$ but instead to another maximum value, then you can call the <code>nextInt</code> method inserting the max integer as a parameter.</p>
<p>Random Integer from 0-10 (not including 10)</p>
<pre><code class="language-java">int randInt2 = rand.nextInt(10);</code></pre>
<h2>Output</h2>
<p>We have already encountered <code>System.out.println</code> and <code>System.out.print</code> but let us go over the differences again.</p>
<p><code>System.out.println()</code> prints the contents inside the parenthesis and appends a newline character afterwards so that the next output is on a new line</p>
<p><code>System.out.print()</code> prints the contents inside the parenthesis and does not output a newline character</p>
<h3>Formatting Output</h3>
<p>If you want more control on how your output is displayed, it is recommended that you use <code>System.out.printf</code> to format your output</p>
<p>First, you need to specify your type using the % instruction</p>
<ul>
<li>d for integer</li>
<li>f for decimal</li>
</ul>
<p>Example:</p>
<pre><code class="language-java">int sum = 50;
System.out.printf("Total = %d", sum);</code></pre>
<p>This outputs </p>
<pre><code class="language-reS">Total = 50</code></pre>
<p>Notice here that there is no concatenation required like the previous two methods, instead you insert the variables as parameters</p>
<p>Let us deconstruct the % instruction</p>
<p>% <strong> </strong> . <strong> </strong></p>
<p>The first underline is the + - 0 space (sometimes we want to pad the money with zeros)</p>
<p>The second underline is the width of the text</p>
<p>The third underline is the number of decimal places</p>
<p>The the final underline is the specifier <code>f</code> for decimal and <code>d</code> for integer</p>
<p><u>Example</u></p>
<pre><code class="language-java">double amount = 0.5;
System.out.printf("Total Due: %0.2f")</code></pre>
<p>This outputs</p>
<pre><code class="language-reStructuredText">Total Due: 0.50</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,128 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for March 13th</h1>
<h2>Methods</h2>
<p>Methods are small blocks of statements that make it easier to solve a problem. It usually focuses on solving a small part of the overall problem.</p>
<p>Usually in methods you provide some sort of input and get some output out of it.</p>
<h3>Advantages</h3>
<ul>
<li>Code readability</li>
<li>Modular program development (break up the problem in chunks)</li>
<li>Incremental development</li>
<li>No redundant code!</li>
</ul>
<h3>Method definition</h3>
<p>Consists of a method name, input and output and the block of statements.</p>
<p>Usually this is succinctly written using JavaDocs which is what you see in the JavaAPI</p>
<h3>Method Call</h3>
<p>A method call is the execution of the method. The statements defined in the method is what will execute.</p>
<h3>Method Stubs</h3>
<p>Recall from method definition the parts of the method definition. Now look at the following method</p>
<pre><code class="language-java">String[] split(String s)</code></pre>
<p>The output here is <code>String[]</code></p>
<p>The method name is <code>split</code></p>
<p>The input is <code>String s</code></p>
<h2>Modular Programming</h2>
<p>Let us look at the following example:</p>
<p>The program should have a list of grocery prices. It should be able to calculate the total cost of groceries. The store gives a student discount of 5%. The program should calculate this discount and update the total, it should calculate and add the 2.5% tax.</p>
<ul>
<li>First you should add it all up</li>
<li>Then compute the discount</li>
<li>Then add the tax</li>
</ul>
<h2>Parts of a method definition</h2>
<pre><code class="language-java">public static int timesTwo(int num) {
int two = num * 2;
return two;
}</code></pre>
<p>It first starts off by declaring the visibility <code>public</code></p>
<p>The return type if <code>int</code></p>
<p>The method name is <code>timesTwo</code></p>
<p>The input parameter is <code>int num</code></p>
<p>Between the curly braces is the <em>body</em> of the method</p>
<h2>Calling a Method</h2>
<pre><code class="language-java">int a = 5;
int b = 3;
int ans = multiply(a, b)</code></pre>
<p>The method call is <code>multiply(a, b)</code> and the result is stored in the variable <code>ans</code></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,98 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for March 20th</h1>
<h2>Unit Testing</h2>
<p>With unit testing you are able to test small parts as you go. With unit testing, you can test many examples and border cases (extreme inputs) to see how your code reacts to different input.</p>
<h2>Variable Scope</h2>
<p>A variable declared inside a method is only accessible inside that method.</p>
<p>A good rule of thumb is that if a variable is declared within curly braces {} then it does not exist outside that curly brace.</p>
<h2>Method Overloading</h2>
<p>Sometimes we need to have methods with the same name but different input parameters. </p>
<pre><code class="language-java">public static int multiply(int a, int b) {
return a * b;
}</code></pre>
<p>This method only works with integers, what if we want to multiply two doubles?</p>
<p>We can overload the method by declaring another method with the same name.</p>
<pre><code class="language-java">public static double multiply(double a, double b) {
return a * b;
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,123 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture on March 22nd</h1>
<h2>Method Documentation</h2>
<p>Java has a special way that you can document your methods such that it will create documentation for you if you follow the convention.</p>
<p>The Java API actually uses this technique to produce its own documentation.</p>
<p>To create this, indicate a method with special comments that begin with <code>/**</code> and ends with <code>*/</code></p>
<p>It contains <em>block tags</em> that describe input and output parameters</p>
<p><code>@param</code> and <code>@return</code></p>
<h3>Example</h3>
<pre><code class="language-java">/**
* @param y an integer to sum
* @param x an integer to sum
* @return the sum of x and y
*/
public int multiply(int x, int y) {
return x + y;
}</code></pre>
<h2>Passing a Scanner</h2>
<p>We only want to create one <strong>user input scanner</strong> per program, we also only want one <strong>file input scanner</strong> per program.</p>
<p>If a method needs a scanner, you can pass the one you already created in as an input parameter.</p>
<h2>Array as Input Parameter</h2>
<p>Primitive types (<code>int</code>, <code>char</code>, <code>double</code>, etc.) are passed by value. Modifications made inside a method cannot be seen outside the method.</p>
<p>Arrays on the other hand, is pass by reference. Changes made to an array inside the method can be seen outside the method.</p>
<pre><code class="language-java">public static void main(String[] args) {
int[] nums = {1, 3, 5, 7, 9};
timesTwo(nums);
}
public static void timesTwo(int[] arr) {
for (int i = 0; i &lt; arr.length; i++) {
arr[i] *= 2;
}
}</code></pre>
<p>At the end of the <code>timesTwo</code> method call, the variable <code>nums</code> would have <code>{2, 6, 10, 14, 18}</code></p>
<h2>Sizes of Arrays</h2>
<h3>Perfect Size Arrays</h3>
<p>When we declare an array, Java automatically fills every slot of the array with the type in memory. So if you know that you need exactly 8 slots, then you only ask for 8.</p>
<h3>Oversize Arrays</h3>
<p>This is when we don't know how many slots we need. Therefore, we ask for more than we think we'll need. That way we don't go out of bounds.</p>
<p>If we do this, then we don't know how many elements we have already inserted into the array. Since the length is the number of slots.</p>
<p>So we can create another variable, which will keep track of the index in where we can add the next element.</p>
<p>We use oversized arrays when the size of the array is unknown or may change.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,181 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for March 27</h1>
<h2>In the Real World...</h2>
<p>Objects are known for having characteristics. A car has on average 4 wheels, 2-4 doors, a steering wheel.</p>
<p>Objects can perform actions. A car can drive, hold cargo, and honk.</p>
<h2>In the Programming World...</h2>
<p>Object-Oriented Programming</p>
<ul>
<li>Focuses on objects</li>
<li>Are not linear</li>
<li>Adds organization to a program</li>
<li>Fits with human cognition (making abstractions)</li>
</ul>
<h2>Class Structure</h2>
<pre><code class="language-java">public class Classname {
// Fields
// Constructors
// Methods
}</code></pre>
<h2>Fields</h2>
<p>Fields are instance variables, they store values, help define state, and exist in memory for the lifetime of the object.</p>
<pre><code class="language-java">public class Car {
private double price;
private double gas;
}</code></pre>
<h2>Constructor</h2>
<p>We can build an object through a constructor. It is a special kind of method, this method requires that you not have a return type and that you name it the same as the class itself.</p>
<p>Constructors help set default field values for the different properties of our class.</p>
<pre><code class="language-java">public class Car {
// Begin Constructor
public Car(double cost) {
this.price = cost;
this.gas = 0;
}
// End Constructor
private double price;
private double gas;
}</code></pre>
<p><strong>Note:</strong> The <code>this</code> keyword refers to the object's fields. This helps keep it separate from other variables you can create in the method and the input parameters you receive.</p>
<h2>Accessor Method - &quot;Getter&quot;</h2>
<p>We like to classify methods into two types, accessors and mutators.</p>
<p>Getter methods return a copy of an instance field. It does not change the state of the object.</p>
<pre><code class="language-java">public double getPrice() {
return this.price;
}</code></pre>
<h2>Mutator Method - &quot;Setter&quot;</h2>
<p>This type of method modifies an instance field. It does not return anything and changes the state of the object.</p>
<pre><code class="language-java">public void setPrice(double cost) {
this.price = cost;
}</code></pre>
<h2>Example of Car Class In All Its Glory</h2>
<pre><code class="language-java">public class Car {
// Instance Variables
private int mpg;
private double price;
// Constructors
public Car() {
this.price = 0;
this.mpg = 0;
}
public Car(double cost, int mpg) {
this.price = cost;
this.mpg = mpg;
}
// Accessors
public double getPrice() {
return this.price''
}
public int getMpg() {
return this.mpg;
}
// Mutators
public void setPrice(double cost) {
this.price = cost;
}
public void setMpg(int mpg) {
this.mpg = mpg;
}
}</code></pre>
<h2>Using Classes</h2>
<p>Just like how we used the <code>Scanner</code> class, we can also use our new <code>Car</code> class.</p>
<pre><code class="language-java">public class TestCar {
public static void main(String[] args) {
// Declare an object reference
Car c;
// Initialize the object
c = new Car();
// Update the fields of the object
c.setPrice(3000);
c.setMpg(22);
// Print object information
System.out.println("Price is " + c.getPrice() )
}
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,114 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture for March 29th</h1>
<h2>Enumerated Types</h2>
<p>These represent a fixed set of constants and include all possible values within them.</p>
<p>Let's look at coins. On a daily basis in the US, we use the following coins:</p>
<ul>
<li>Penny</li>
<li>Nickel</li>
<li>Dime</li>
<li>Quarter</li>
</ul>
<p>Other examples include the days of the week, clothes sizes, etc.</p>
<h2>Enum Syntax</h2>
<p>Let's define an <code>enum</code> type</p>
<pre><code class="language-java">public enum Coin { PENNY, NICKEL, DIME, QUARTER}</code></pre>
<p>Now declare and initialize a variable</p>
<pre><code class="language-java">Coin myCoin = Coin.PENNY</code></pre>
<h2>Arrays vs ArrayList</h2>
<p>Arrays require you to say upfront, how many slots you need. ArrayLists are more flexible since you can change the length of the array during Runtime. </p>
<p>Arrays can store objects and primitives such as <code>int</code>, <code>char</code>, <code>boolean</code>, etc.</p>
<p>ArrayLists can only store objects.</p>
<h3>How to declare an ArrayList</h3>
<pre><code class="language-java">ArrayList&lt;objectType&gt; list = new ArrayList&lt;objectType&gt;();</code></pre>
<h3>Differences between getting the length of the array</h3>
<p><strong>Array</strong></p>
<pre><code class="language-java">int length = array.length;</code></pre>
<p><strong>ArrayList</strong></p>
<pre><code class="language-java">int length = array.size();</code></pre>
<h2>For Each Loop</h2>
<p>This is a special loop in where you tell it to go through all the elements of the array, without specifying an index.</p>
<pre><code class="language-java">for (String b : buildings) {
System.out.print(b);
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,102 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>More Midterm Review</h1>
<p>Let us suppose that we have the following array</p>
<pre><code class="language-java">int[] b = {11, 12, 15, 16, 21}</code></pre>
<h2>Increase all elements by 2</h2>
<pre><code class="language-java">for (int i = 0; i &lt; b.length; i++) {
b[i] += 2;
}</code></pre>
<h2>Print all elements of array</h2>
<pre><code class="language-java">for (int i = 0; i &lt; b.length; i++) {
System.out.println(b[i]);
}</code></pre>
<h2>Sum all the elements of an array</h2>
<pre><code class="language-java">int sum = 0;
for (int i = 0; i &lt; b.length; i++) {
sum += b[i];
}</code></pre>
<h2>Access Last Element of Array</h2>
<pre><code class="language-java">System.out.println(b[b.length - 1]);</code></pre>
<h2>Access the middle element</h2>
<pre><code class="language-java">System.out.println(b[b.length / 2]);</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,101 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>CPSC 220 Computer Programming and Problem Solving Spring 2018</h1>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fjan16.html">Lecture 1 -- January 16</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fjan18.html">Lecture 2 -- January 18</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fjan23.html">Lecture 3 -- January 23</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fjan25.html">Lecture 4 -- January 25</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fjan30.html">Lecture 5 -- January 30</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Ffeb1.html">Lecture 6 -- February 1</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Ffeb6.html">Lecture 7 -- February 6</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Ffeb8.html">Lecture 8 -- February 8</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Ffeb13.html">Lecture 9 -- February 13</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Ffeb20.html">Lecture 10 -- February 20</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Ffeb27.html">Lecture 11 -- February 27</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fmidtermreview.html">Midterm Review</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fmar13.html">Lecture 12 -- March 13</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fmar20.html">Lecture 13 -- March 20</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fmar22.html">Lecture 14 -- March 22</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fmar27.html">Lecture 15 -- March 27</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fmar29.html">Lecture 16 -- March 29</a></p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220%252Fapr3.html">Lecture 17 -- April 3</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,89 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>CPSC 110 Introduction to Computer Science Summer 2017</h1>
<p>For the summer session, I didn't write in depth lecture notes. Instead I wrote short complementary material on my old website to help out with the labs.</p>
<p><a href="https://brandonrozek.com/2017/05/viewing-java-applets/">Viewing Java Applets</a></p>
<p><a href="https://brandonrozek.com/2017/06/using-system-themes-java-swing/">Using System Themes in Java Swing</a></p>
<p><a href="https://brandonrozek.com/2017/06/java-swing-components/">Java Swing Components</a></p>
<p><a href="https://brandonrozek.com/2017/08/escape-sequences-java/">Escape Sequences in Java</a></p>
<p><a href="https://brandonrozek.com/2017/08/obtaining-command-line-input-java/">Obtaining Command Line Input in Java</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,95 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="My work as a lab aide">
<title>Lab Aide | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem active">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lab Aide</h1>
<h2>Dr. Zeitz Self</h2>
<p>I am the lab aide one of her sections on Computer Programming and Problem Solving (CPSC 220).</p>
<p>My role in this class is to help students during lab time and to create lecture notes based on the lectures given in class.</p>
<p><a href="index.html%3Flabaide%252Fspring2018%252Fcpsc220.html">Spring 2018/CPSC 220 Notes</a></p>
<h2>Dr. Jennifer Polack</h2>
<p>I've been the lab aide for two of Polack's classes so far: Introduction to Computer Science (CPSC 110) and Computer Programming and Problem Solving (CPSC 220).</p>
<p>My role involves helping students debug during the lab time. I also create lecture notes for students to look over while working on projects.</p>
<p><a href="index.html%3Flabaide%252Fsummer2017%252Fcpsc110.html">Summer 2017/CPSC 110 Notes</a></p>
<p><a href="index.html%3Flabaide%252Ffall2017%252Fcpsc220.html">Fall 2017/CPSC 220 Lecture Notes</a></p>
<h2>Dr. Ron Zacharski</h2>
<p>I was the lab aide for Zacharski's DATA 101 class. My role in his class mostly involved helping students debug during lab time and grading student's demonstrations of what they've done in a lab.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,90 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="This is a page of presentations I was a part of.">
<title>Presentations | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem active">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Presentations</h1>
<h2>Embezzlement of Parking Meter Funds: A Computational Approach to Hypothesis Testing</h2>
<p>I was invited to give a talk to the Data Science Group at the University of Mary Washington called the Data Mavens. It was a 25 minute presentation on the bootstrap resampling technique with code on how to get started.</p>
<p><a href="/files/slides/embezzlement.pdf">Slides PDF</a></p>
<h2>Similyrics: A Music Recommendation Engine Based on Lyrics</h2>
<p>At the VCU RamHacks hackathon, Clare Arrington, Harrison Crosse, and I demoed our product Similyrics. It's a web application that takes your favorite song, grabs the lyrics, and finds a song from a database that closely matches to the song you have chosen lyric wise.</p>
<p><a href="/files/slides/similyrics.pdf">Slides PDF</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,302 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Measures of similarity</h1>
<p>To identify clusters of observations we need to know how <strong>close individuals are to each other</strong> or <strong>how far apart they are</strong>.</p>
<p>Two individuals are 'close' when their dissimilarity of distance is small and their similarity large.</p>
<p>Special attention will be paid to proximity measures suitable for data consisting of repeated measures of the same variable, for example taken at different time points.</p>
<h2>Similarity Measures for Categorical Data</h2>
<p>Measures are generally scaled to be in the interval $[0, 1]$, although occasionally they are expressed as percentages in the range $0-100\%$</p>
<p>Similarity value of unity indicates that both observations have identical values for all variables</p>
<p>Similarity value of zero indicates that the two individuals differ maximally for all variables.</p>
<h3>Similarity Measures for Binary Data</h3>
<p>An extensive list of similarity measures for binary data exist, the reason for such is that a large number of possible measures has to do with the apparent uncertainty as to how to <strong>deal with the count of zero-zero matches</strong></p>
<p>In some cases, zero-zero matches are equivalent to one-one matches and therefore should be included in the calculated similarity measure</p>
<p><u>Example</u>: Gender, where there is no preference as to which of the two categories should be coded as zero or one</p>
<p>In other cases the inclusion or otherwise of the matches is more problematic</p>
<p><u>Example</u>: When the zero category corresponds to the genuine absence of some property, such as wings in a study of insects</p>
<p>The question that then needs to be asked is do the co-absences contain useful information about the similarity of the two objects?</p>
<p>Attributing a high degree of similarity to a pair of individuals simply because they both lack a large number of attributes may not be sensible in many situations </p>
<p>The following table below will help when it comes to interpreting the measures</p>
<p><img src="file:///home/rozek/Documents/Spring2018/Cluster%20Analysis/BinaryOutcomeTable.png?lastModify=1516246712" alt="img" /></p>
<p>Measure that ignore the co-absence (lack of both objects having a zero) are Jaccard's Coefficient (S2), Sneath and Sokal (S4)</p>
<p>When co-absences are considered informative, the simple matching coefficient (S1) is usually employed.</p>
<p>Measures S3 and S5 are further examples of symmetric coefficients that treat positive matches (a) and negative matches (d) in the same way. </p>
<p>![BinarySimilarityMeasures](/home/rozek/Documents/Spring2018/Cluster Analysis/BinarySimilarityMeasures.png)</p>
<h3>Similarity Measures for Categorical Data with More Than Two Levels</h3>
<p>Categorical data where the variables have more than two levels (for example, eye color) could be dealt with in a similar way to binary data, with each level of a variable being regarded as a single binary variable. </p>
<p>This is not an attractive approach, however, simply because of the large number of negative matches which will inevitably be involved. </p>
<p>A superior method is to allocate a score of zero or one to each variable depending on whether the two observations are the same on that variable. These scores are then averaged over all p variables to give the required similarity coefficient as
$$
s<em>{ij} = \frac{1}{p}\sum</em>{k = 1}^p{s_{ik}}
$$</p>
<h3>Dissimilarity and Distance Measures for Continuous Data</h3>
<p>A <strong>metric</strong> on a set $X$ is a distance function
$$
d : X \times X \to [0, \infty)
$$
where $[0, \infty)$ is the set of non-negative real numbers and for all $x, y, z \in X$, the following conditions are satisfied</p>
<ol>
<li>$d(x, y) \ge 0$ non-negativity or separation axiom
<ol>
<li>$d(x, y) = 0 \iff x = y$ identity of indiscernibles</li>
</ol></li>
<li>$d(x, y) = d(y, x)$ symmetry</li>
<li>$d(x, z) \le d(x, y) + d(y, z)$ subadditivity or triangle inequality</li>
</ol>
<p>Conditions 1 and 2 define a positive-definite function</p>
<p>All distance measures are formulated so as to allow for differential weighting of the quantitative variables $w_k$ denotes the nonnegative weights of $p$ variables</p>
<p>![continuousdissimilaritymeasures](/home/rozek/Documents/Spring2018/Cluster Analysis/continuousdissimilaritymeasures.png)</p>
<p>Proposed dissimilarity measures can be broadly divided into distance measures and correlation-type measures.</p>
<h4>Distance Measures</h4>
<h5>$L^p$ Space</h5>
<p>The Minkowski distance is a metric in normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance
$$
D(X, Y) = (\sum_{i = 1}^n{w_i^p|x_i - y_i|^p})^{\frac{1}{p}}
$$
This is a metric for $p &gt; 1$</p>
<h6>Manhattan Distance</h6>
<p>This is the case in the Minkowski distance when $p = 1$
$$
d(X, Y) = \sum_{i = 1}^n{w_i|x_i - y_i|}
$$
Manhattan distance depends on the rotation of the coordinate system, but does not depend on its reflection about a coordinate axis or its translation
$$
d(x, y) = d(-x, -y)
$$</p>
<p>$$
d(x, y) = d(x + a, y + a)
$$</p>
<p>Shortest paths are not unique in this metric</p>
<h6>Euclidean Distance</h6>
<p>This is the case in the Minkowski distance when $p = 2$. The Euclidean distance between points X and Y is the length of the line segment connection them.
$$
d(X, Y) = \sqrt{\sum_{i = 1}^n{w_i^2(x_i - y_i)^2}}
$$
There is a unique path in which it has the shortest distance. This distance metric is also translation and rotation invariant</p>
<h6>Squared Euclidean Distance</h6>
<p>The standard Euclidean distance can be squared in order to place progressively greater weight on objects that are farther apart. In this case, the equation becomes
$$
d(X, Y) = \sum_{i = 1}^n{w_i^2(x_i - y_i)^2}
$$
Squared Euclidean Distance is not a metric as it does not satisfy the <a href="https://en.wikipedia.org/wiki/Triangle_inequality">triangle inequality</a>, however, it is frequently used in optimization problems in which distances only have to be compared.</p>
<h6>Chebyshev Distance</h6>
<p>The Chebyshev distance is where the distance between two vectors is the greatest of their differences along any coordinate dimension.</p>
<p>It is also known as <strong>chessboard distance</strong>, since in the game of <a href="https://en.wikipedia.org/wiki/Chess">chess</a> the minimum number of moves needed by a <a href="https://en.wikipedia.org/wiki/King_(chess)">king</a> to go from one square on a <a href="https://en.wikipedia.org/wiki/Chessboard">chessboard</a> to another equals the Chebyshev distance
$$
d(X, Y) = \lim<em>{p \to \infty}{(\sum</em>{i = 1}^n{|x_i - y_i|^p})}^\frac{1}{p}
$$</p>
<p>$$
= max_i(|x_i - y_i|)
$$</p>
<p>Chebyshev distance is translation invariant</p>
<h5>Canberra Distance Measure</h5>
<p>The Canberra distance (D4) is a weighted version of the $L<em>1$ Manhattan distance. This measure is very sensitive to small changes close to $x</em>{ik} = x_{jk} = 0$. </p>
<p>It is often regarded as a generalization of the dissimilarity measure for binary data. In this context the measure can be divided by the number of variables, $p$, to ensure a dissimilarity coefficient in the interval $[0, 1]$</p>
<p>It can then be shown that this measure for binary variables is just one minus the matching coefficient.</p>
<h3>Correlation Measures</h3>
<p>It has often been suggested that the correlation between two observations can be used to quantify the similarity between them. </p>
<p>Since for correlation coefficients we have that $-1 \le \phi<em>{ij} \le 1$ with the value 1 reflecting the strongest possible positive relationship and the value -1 the strongest possible negative relationship, these coefficients can be transformed into dissimilarities, $d</em>{ij}$, within the interval $[0, 1]$</p>
<p>The use of correlation coefficients in this context is far more contentious than its noncontroversial role in assessing the linear relationship between two variables based on $n$ observations.</p>
<p>When correlations between two individuals are used to quantify their similarity the <u>rows of the data matrix are standardized</u>, not its columns.</p>
<p><strong>Disadvantages</strong></p>
<p>When variables are measured on different scales the notion of a difference between variable values and consequently that of a mean variable value or a variance is meaningless.</p>
<p>In addition, the correlation coefficient is unable to measure the difference in size between two observations.</p>
<p><strong>Advantages</strong></p>
<p>However, the use of a correlation coefficient can be justified for situations where all of the variables have been measured on the same scale and precise values taken are important only to the extent that they provide information about the subject's relative profile</p>
<p><u>Example:</u> In classifying animals or plants, the absolute size of the organisms or their parts are often less important than their shapes. In such studies the investigator requires a dissimilarity coefficient that takes the value zero if and only if two individuals' profiles are multiples of each other. The angular separation dissimilarity measure has this property.</p>
<p><strong>Further considerations</strong></p>
<p>The Pearson correlation is sensitive to outliers. This has prompted a number of suggestions for modifying correlation coefficients when used as similarity measures; for example, robust versions of correlation coefficients such as <em>jackknife correlation</em> or altogether more general association coefficients such as <em>mutual information distance measure</em></p>
<h4>Mahalanobis (Maximum) Distance [Not between 2 observations]</h4>
<p>Mahalanobis distance is a measure of distance between a point P and a distribution D. It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D</p>
<p>Mahalanobis distance is unitless and scale-invariant and takes into account the correlations of the data set
$$
D(\vec{x}) = \sqrt{(\vec{x} - \vec{\mu})^T S^{-1}(\vec{x}-\vec{\mu})}
$$
Where $\mu$ is a set of mean observations and $S$ is the covariance matrix</p>
<p>If the covariance matrix is diagonal then the resulting distance measure is called a normalized Euclidean distance.
$$
d(\vec{x}, \vec{y}) = \sqrt{\sum_{i = 1}^N{\frac{(x_i - y_i)^2}{s^2_i}}}
$$
Where $s_i$ is the standard deviation of the $x_i$ and $y_i$ over the sample set</p>
<h4>Discrete Metric</h4>
<p>This metric describes whether or not two observations are equivalent
$$
\rho(x, y) = \begin{cases}
1 &amp; x \not= y \
0 &amp; x = y
\end{cases}
$$</p>
<h2>Similarity Measures for Data Containing both Continuous and Categorical Variables</h2>
<p>There are a number of approaches to constructing proximities for mixed-mode data, that is, data in which some variables are continuous and some categorical.</p>
<ol>
<li>Dichotomize all variables and use a similarity measure for binary data</li>
<li>Rescale all the variables so that they are on the same scale by replacing variable values by their ranks among the objects and then using a measure for continuous data</li>
<li>Construct a dissimilarity measure for each type of variable and combine these, either with or without differential weighting into a single coefficient.</li>
</ol>
<p>Most general-purpose statistical software implement a number of measurs for converting two-mode data matrix into a one-mode dissimilarity matrix.</p>
<p>R has <code>cluster</code>, <code>clusterSim</code>, or <code>proxy</code></p>
<h3>Proximity Measures for Structured Data</h3>
<p>We'll be looking at data that consists of repeated measures of the same outcome variable but under different conditions.</p>
<p>The simplest and perhaps most commonly used approach to exploiting the reference variable is in the construction of a reduced set of relevant summaries per object which are then used as the basis for defining object similarity.</p>
<p>Here we will look at some approaches for choosing summary measures and resulting proximity measures for the most frequently encountered reference vectors (e.g. time, experimental condition, and underlying factor)</p>
<p>Structured data arise when the variables can be assumed to follow a known <em>factor model</em>. Under <em>confirmatory factor analysis model</em> each variable or item can be allocated to one of a set of underlying factors or concepts. The factors cannot be observed directly but are 'indicated' by a number of items that are all measured on the same scale.</p>
<p>Note that the summary approach, while typically used with continuous variables, is not limited to variables on an interval scale. The same principles can be applied to dealing with categorical data. The difference is that summary measures now need to capture relevant aspects of the distribution of categorical variables over repeated measures.</p>
<p>Rows of <strong>$X$</strong> which represent ordered lists of elements, that is all the variables provide a categorical outcome and these variables can be aligned in one dimension, are more generally referred to as <em>sequences</em>. <em>Sequence analysis</em> is an area of research that centers on problems of events and actions in their temporal context and includes the measurements of similarities between sequences.</p>
<p>The most popular measure of dissimilarity between two sequences is the Levenhstein distance and counts the minimum number of operations needed to transform one sequence of categories into another, where an operation is an insertion, a deletion, or a substitution of a single category. Each operation may be assigned a penalty weight (a typical choice would be to give double the penalty to a substitution as opposed to an insertion or deletion. The measure is sometimes called the 'edit distance' due to its application in spell checkers.</p>
<p>Optimal matching algorithms (OMAs) need to be employed to find the minimum number of operations required to match one sequence to another. One such algorithm for aligning sequences is the Needleman-Wunsch algorithm, which is commonly used in bioinformatics to align proteins.</p>
<p>The <em>Jary similarity measure</em> is a related measure of similarity between sequences of categories often used to delete duplicates in the area of record linkage.</p>
<h2>Inter-group Proximity Measures</h2>
<p>In clustering applications, it becomes necessary to consider how to measure the proximity between groups of individuals.</p>
<ol>
<li>The proximity between two groups might be defined by a suitable summary of the proximities between individuals from either group</li>
<li>Each group might be described by a representative observation by choosing a suitable summary statistic for each variable, and the inter group proximity defined as the proximity between the representative observations.</li>
</ol>
<h3>Inter-group Proximity Derived from the Proximity Matrix</h3>
<p>For deriving inter-group proximities from a matrix of inter-individual proximities, there are a variety of possibilities</p>
<ul>
<li>Take the smallest dissimilarity between any two individuals, one from each group. This is referred to as <em>nearest-neighbor distance</em> and is the basis of the clustering technique known as <em>single linkage</em></li>
<li>Define hte intergroup distance as the largest distance between any two individuals, one from each group. This is known as the <em>furthest-neighbour distance</em> and constitute the basis of <em>complete linkage</em> cluster method.</li>
<li>Define as the average dissimiliarity between individuals from both groups. Such a measure is used in <em>group average clustering</em></li>
</ul>
<h3>Inter-group Proximity Based on Group Summaries for Continuous Data</h3>
<p>One method for constructing inter-group dissimilarity measures for continuous data is to simply substitute group means (also known as the centroid) for the variable values in the formulae for inter-individual measures</p>
<p>More appropriate, however, might be measures which incorporate in one way or another, knowledge of within-group variation. One possibility is to use Mahallanobis's generalized distance.</p>
<h4>Mahalanobis (Maximum) Distance</h4>
<p>Mahalanobis distance is a measure of distance between a point P and a distribution D. It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D</p>
<p>Mahalanobis distance is unitless and scale-invariant and takes into account the correlations of the data set
$$
D(\vec{x}) = \sqrt{(\vec{x} - \vec{\mu})^T S^{-1}(\vec{x}-\vec{\mu})}
$$
Where $\mu$ is a set of mean observations and $S$ is the covariance matrix</p>
<p>If the covariance matrix is diagonal then the resulting distance measure is called a normalized Euclidean distance.
$$
d(\vec{x}, \vec{y}) = \sqrt{\sum_{i = 1}^N{\frac{(x_i - y_i)^2}{s^2_i}}}
$$
Where $s_i$ is the standard deviation of the $x_i$ and $y_i$ over the sample set</p>
<p>Thus, the Mahalanobis distance incraeses with increasing distances between the two group centers and with decreasing within-group variation.</p>
<p>By also employing within-group correlations, the Mahalanobis distance takes account the possibly non-spherical shapes of the groups.</p>
<p>The use of Mahalanobis implies that the investigator is willing to <strong>assume</strong> that the covariance matrices are at least approximately the same in the two groups. When this is not so, this measure is an inappropriate inter-group measure. Other alternatives exist such as the one proposed by Anderson and Bahadur</p>
<img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczM3L2UwLXMzL2VpL3RtYTBjMGdzY2QwLmkxLWdtaWY-" alt="equation">
<p>Another alternative is the <em>normal information radius</em> suggested by Jardine and Sibson</p>
<img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczM4L2UwLXMzL2VpL3RtYTBjMGdzY2QwLmkxLWdtaWY-" alt="equation">
<h3>Inter-group Proximity Based on Group Summaries for Categorical Data</h3>
<p>Approaches for measuring inter-group dissimilarities between groups of individuals for which categorical variables have been observed have been considered by a number of authors. Balakrishnan and Sanghvi (1968), for example, proposed a dissimilarity index of the form</p>
<p><img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczMwL2UwLXMzL2VpL3RtYTBjMGdzY2QwLmkyLWdtaWY-" alt="equation" /></p>
<p>where $p<em>{Akl}$ and $p</em>{Bkl}$ are the proportions of the lth category of the kth variable in group A and B respectively, <img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczNmL2VnMi8zbGVpL3RtYTBjbmdzaWUuMGlpbg--" alt="img" />, ck + 1 is the number of categories for the kth variable and p is the number of variables.</p>
<p>Kurczynski (1969) suggested adapting the generalized Mahalanobis distance, with categorical variables replacing quantitative variables. In its most general form, this measure for inter-group distance is given by</p>
<p><img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczMxL2UwLXMzL2VpL3RtYTBjMGdzY2QwLmkyLWdtaWY-" alt="equation" /></p>
<p>where <img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczNmL2VnMy8zbGVpL3RtYTBjbmdzaWUuMGlpbg--" alt="img" /> contains sample proportions in group A and <img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczNmL2VnNC8zbGVpL3RtYTBjbmdzaWUuMGlpbg--" alt="img" /> is defined in a similar manner, and <img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczNmL2VnNS8zbGVpL3RtYTBjbmdzaWUuMGlpbg--" alt="img" /> is the m × m common sample covariance matrix, where <img src="http://proquest.safaribooksonline.com.ezproxy.umw.edu/getfile?item=cjlhZWEzNDg0N2R0cGMvaS9zMG1nODk0czcvN3MwczNmL2VnNi8zbGVpL3RtYTBjbmdzaWUuMGlpbg--" alt="img" />. </p>
<h2>Weighting Variables</h2>
<p>To weight a variable means to give it greater or lesser importance than other variables in determining the proximity between two objects. </p>
<p>The question is 'How should the weights be chosen?' Before we discuss this question, it is important to realize that the selection of variables for inclusion into the study already presents a form of weighting, since the variables not included are effectively being given the weight $0$.</p>
<p>The weights chosen for the variables reflect the importance that the investigator assigns the variables for the classification task.</p>
<p>There are several approaches to this</p>
<ul>
<li>Authors obtain perceived dissimilarities between selected objects, they then model the dissimilarities using the underlying variables and weights that indicate their relative importance. The weights that best fit the perceived dissimilarities are then chosen.</li>
<li>Define the weights to be inversely proportion to some measure of variability in this variable. This choice of weights implies that the importance of a variable decreases when its variability increases.
<ul>
<li>For a continous variable, the most commonly emplyed weight is either the reciprocal of its standard deviation or the reciprocal of its range</li>
<li>Employing variability weights is equivalent to what is commonly referred to as <em>standardizing</em> the variables.</li>
</ul></li>
<li>Construct weights from the data matrix using <em>variable section</em>. In essence, such procedures proceed in an iterative fashion to identify variables which, when contributing to a cluster algorithm, lead to internally cohesive and externally isolated clusters and, when clustered singly, produce reasonable agreement.</li>
</ul>
<p>The second approach assumed the importance of a variable to be inversely proportional to the total variability of that variable. The total variability of a variable comprises variation both within and between groups which may exist within the set of individuals. The aim of cluster analysis is typically to identify such groups. Hence it can be argued that the importance of a variable should not be reduced because of between-group variation (on the contrary, one might wish to assign more importance to a variable that shows larger between-group variation.)</p>
<p>Gnanadesikan et al. (1995) assessed the ability of squared distance functions based on data-determined weights, both those described above and others, to recover groups in eight simulated and real continuous data sets in a subsequent cluster analysis. Their main findings were:</p>
<ol>
<li>Equal weights, (total) standard deviation weights, and range weights were generally ineffective, but range weights were preferable to standard deviation weights.</li>
<li>Weighting based on estimates of within-cluster variability worked well overall.</li>
<li>Weighting aimed at emphasizing variables with the most potential for identifying clusters did enhance clustering when some variables had a strong cluster structure.</li>
<li>Weighting to optimize the fitting of a hierarchical tree was often even less effective than equal weighting or weighting based on (total) standard deviations.</li>
<li>Forward variable selection was often among the better performers. (Note that all-subsets variable selection was not assessed at the time.)</li>
</ol>
<h2>Standardization</h2>
<p>In many clustering applications, the variables describing the objects to be clustered will not be measured in the same units. A number of variability measures have been used for this purpose</p>
<ul>
<li>When standard deviations calculated from the complete set of objects to be clustered are used, the technique is often referred to as <em>auto-scaling, standard scoring, or z-scoring</em>. </li>
<li>Division by the median absolute deviations or by the ranges.</li>
</ul>
<p>The second is shown to outperform auto-scaling in many clustering applications. As pointed out in the previous section, standardization of variables to unit variance can be viewed as a special case of weighting.</p>
<h2>Choice of Proximity Measure</h2>
<p>Firstly, the nature of the data should strongly influence the choice of the proximity measure. </p>
<p>Next, the choice of measure should depend on the scale of the data. Similarity coefficients should be used when the data is binary. For continuous data, distance of correlation-type dissimilarity measure should be used according to whether 'size' or 'shape' of the objects is of interest. </p>
<p>Finally, the clustering method to be used might have some implications for the choice of the coefficient. For example, making a choice between several proximity coefficients with similar properties which are also known to be monotonically related can be avoided by employing a cluster method that depends only on the ranking of the proximities, not their absolute values.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,106 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Silhouette</h1>
<p>This technique validates the consistency within clusters of data. It provides a succinct graphical representation of how well each object lies in its cluster.</p>
<p>The silhouette ranges from -1 to 1 where a high value indicates that the object is consistent within its own cluster and poorly matched to neighboring clustesr.</p>
<p>A low or negative silhouette value can mean that the current clustering configuration has too many or too few clusters.</p>
<h2>Definition</h2>
<p>For each datum $i$, let $a(i)$ be the average distance of $i$ with all other data within the same cluster.</p>
<p>$a(i)$ can be interpreted as how well $i$ is assigned to its cluster. (lower values mean better agreement)</p>
<p>We can then define the average dissimilarity of point $i$ to a cluster $c$ as the average distance from $i$ to all points in $c$.</p>
<p>Let $b(i)$ be the lowest average distance of $i$ to all other points in any other cluster in which i is not already a member.</p>
<p>The cluster with this lowest average dissimilarity is said to be the neighboring cluster of $i$. From here we can define a silhouette:
$$
s(i) = \frac{b(i) - a(i)}{max{a(i), b(i)}}
$$
The average $s(i)$ over all data of a cluster is a measure of how tightly grouped all the data in the cluster are. A silhouette plot may be used to visualize the agreement between each of the data and its cluster.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Silhouette-plot-orange.png/800px-Silhouette-plot-orange.png" alt="img" /></p>
<h3>Properties</h3>
<p>Recall that $a(i)$ is a measure of how dissimilar $i$ is to its own cluster, a smaller value means that it's in agreement to its cluster. For $s(i)$ to be close to 1, we require $a(i) &lt;&lt; b(i)$ .</p>
<p>If $s(i)$ is close to negative one, then by the same logic we can see that $i$ would be more appropriate if it was clustered in its neighboring cluster.</p>
<p>$s(i)$ near zero means that the datum is on the border of two natural clusters.</p>
<h2>Determining the number of Clusters</h2>
<p>This can also be used in helping to determine the number of clusters in a dataset. The ideal number of cluster is one that produces the highest silhouette value.</p>
<p>Also a good indication that one has too many clusters is if there are clusters with the majority of observations being under the mean silhouette value.</p>
<p><a href="https://kapilddatascience.wordpress.com/2015/11/10/using-silhouette-analysis-for-selecting-the-number-of-cluster-for-k-means-clustering/">https://kapilddatascience.wordpress.com/2015/11/10/using-silhouette-analysis-for-selecting-the-number-of-cluster-for-k-means-clustering/</a></p>
<p><img src="https://kapilddatascience.files.wordpress.com/2015/11/plot_kmeans_silhouette_analysis_004.png?w=660&amp;h=257" alt="plot_kmeans_silhouette_analysis_004" /></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,115 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Centroid-based Clustering</h1>
<p>In centroid-based clustering, clusters are represented by some central vector which may or may not be a member of the dataset. In practice, the number of clusters is fixed to $k$ and the goal is to solve some sort of optimization problem.</p>
<p>The similarity of two clusters is defined as the similarity of their centroids.</p>
<p>This problem is computationally difficult so there are efficient heuristic algorithms that are commonly employed. These usually converge quickly to a local optimum.</p>
<h2>K-means clustering</h2>
<p>This aims to partition $n$ observations into $k$ clusters in which each observation belongs to the cluster with the nearest mean which serves as the centroid of the cluster.</p>
<p>This technique results in partitioning the data space into Voronoi cells.</p>
<h3>Description</h3>
<p>Given a set of observations $x$, k-means clustering aims to partition the $n$ observations into $k$ sets $S$ so as to minimize the within-cluster sum of squares (i.e. variance). More formally, the objective is to find
$$
argmin<em>s{\sum</em>{i = 1}^k{\sum_{x \in S_i}{||x-\mu<em>i||^2}}}= argmin</em>{s}{\sum_{i = 1}^k{|S_i|Var(S_i)}}
$$
where $\mu_i$ is the mean of points in $S_i$. This is equivalent to minimizing the pairwise squared deviations of points in the same cluster
$$
argmin<em>s{\sum</em>{i = 1}^k{\frac{1}{2|S<em>i|}\sum</em>{x, y \in S_i}{||x-y||^2}}}
$$</p>
<h3>Algorithm</h3>
<p>Given an initial set of $k$ means, the algorithm proceeds by alternating between two steps.</p>
<p><strong>Assignment step</strong>: Assign each observation to the cluster whose mean has the least squared euclidean distance.</p>
<ul>
<li>Intuitively this is finding the nearest mean</li>
<li>Mathematically this means partitioning the observations according to the Voronoi diagram generated by the means</li>
</ul>
<p><strong>Update Step</strong>: Calculate the new means to be the centroids of the observations in the new clusters</p>
<p>The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm. </p>
<p>The result depends on the initial clusters. It is common to run this multiple times with different starting conditions.</p>
<p>Using a different distance function other than the squared Euclidean distance may stop the algorithm from converging.</p>
<h3>Initialization methods</h3>
<p>Commonly used initialization methods are Forgy and Random Partition.</p>
<p><strong>Forgy Method</strong>: This method randomly chooses $k$ observations from the data set and uses these are the initial means</p>
<p>This method is known to spread the initial means out</p>
<p><strong>Random Partition Method</strong>: This method first randomly assigns a cluster to each observation and then proceeds to the update step. </p>
<p>This method is known to place most of the means close to the center of the dataset.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,90 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Voronoi Diagram</h1>
<p>A Voronoi diagram is a partitioning of a plan into regions based on distance to points in a specific subset of the plane.</p>
<p>The set of points (often called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than any other.</p>
<p>Different metrics may be used and often result in different Voronoi diagrams</p>
<p><strong>Euclidean</strong></p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/54/Euclidean_Voronoi_diagram.svg/382px-Euclidean_Voronoi_diagram.svg.png" alt="" /></p>
<p><strong>Manhattan</strong></p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Manhattan_Voronoi_Diagram.svg/382px-Manhattan_Voronoi_Diagram.svg.png" alt="" /></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,95 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>K-means++</h1>
<p>K-means++ is an algorithm for choosing the initial values or seeds for the k-means clustering algorithm. This was proposed as a way of avoiding the sometimes poor clustering found by a standard k-means algorithm. </p>
<h2>Intuition</h2>
<p>The intuition behind this approach involves spreading out the $k$ initial cluster centers. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its squared distance from the point's closest existing cluster center.</p>
<h2>Algorithm</h2>
<p>The exact algorithm is as follows</p>
<ol>
<li>Choose one center uniformly at random from among data points</li>
<li>For each data point $x$, compute $D(x)$, the distance between $x$ and the nearest center that has already been chosen.</li>
<li>Choose one new data point at random as a new center, using a weighted probability distribution where a point $x$ is chosen with probability proporitonal to $D(x)^2$</li>
<li>Repeat steps 2 and 3 until $k$ centers have been chosen</li>
<li>Now that the initial centers have been chosen, proceed using standard k-means clustering</li>
</ol>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,118 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>K-Medoids</h1>
<p>A medoid can be defined as the object of a cluster whose average dissimilarity to all the objects in the cluster is minimal.</p>
<p>The K-medoids algorithm is related to k-means and the medoidshift algorithm. Both the k-means and k-medoids algorithms are partition and both attempt to minimize the distance between points in the cluster to it's center. In contrast to k-means, it chooses data points as centers and uses the Manhattan Norm to define the distance between data points instead of the Euclidean.</p>
<p>This method is known to be more robust to noise and outliers compared to k-means since it minimizes the sum of pairwise dissimilarities instead of the sum of squared Euclidean distances.</p>
<h2>Algorithms</h2>
<p>There are several algorithms that have been created as an optimization to an exhaustive search. In this section, we'll discuss PAM and Voronoi iteration method.</p>
<h3>Partitioning Around Medoids (PAM)</h3>
<ol>
<li>Select $k$ of the $n$ data points as medoids</li>
<li>Associate each data point to the closes medoid</li>
<li>While the cost of the configuration decreases:
<ol>
<li>For each medoid $m$, for each non-medoid data point $o$:
<ol>
<li>Swap $m$ and $o$, recompute the cost (sum of distances of points to their medoid)</li>
<li>If the total cost of the configuration increased in the previous step, undo the swap</li>
</ol></li>
</ol></li>
</ol>
<h3>Voronoi Iteration Method</h3>
<ol>
<li>Select $k$ of the $n$ data points as medoids</li>
<li>While the cost of the configuration decreases
<ol>
<li>In each cluster, make the point that minimizes the sum of distances within the cluster the medoid</li>
<li>Reassign each point to the cluster defined by the closest medoid determined in the previous step.</li>
</ol></li>
</ol>
<h3>Clustering Large Applications (CLARA</h3>
<p>This is a variant of the PAM algorithm that relies on the sampling approach to handle large datasets. The cost of a particular cluster configuration is the mean cost of all the dissimilarities.</p>
<h2>R Implementations</h2>
<p>Both PAM and CLARA are defined in the <code>cluster</code> package in R.</p>
<pre><code class="language-R">clara(x, k, metric = "euclidean", stand = FALSE, samples = 5,
sampsize = min(n, 40 + 2 * k), trace = 0, medoids.x = TRUE,
keep.data = medoids.x, rngR = FALSE)</code></pre>
<pre><code class="language-R">pam(x, k, metric = "euclidean", stand = FALSE)</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,94 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>K-Medians</h1>
<p>This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead.</p>
<p>This has the effect of minimizing error over all the clusters with respect to the Manhattan norm as opposed to the Euclidean squared norm which is minimized in K-means</p>
<h3>Algorithm</h3>
<p>Given an initial set of $k$ medians, the algorithm proceeds by alternating between two steps.</p>
<p><strong>Assignment step</strong>: Assign each observation to the cluster whose median has the leas Manhattan distance.</p>
<ul>
<li>Intuitively this is finding the nearest median</li>
</ul>
<p><strong>Update Step</strong>: Calculate the new medians to be the centroids of the observations in the new clusters</p>
<p>The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm. </p>
<p>The result depends on the initial clusters. It is common to run this multiple times with different starting conditions.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,124 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Introduction to Density Based Clustering</h1>
<p>In density-based clustering, clusters are defined as areas of higher density than the remainder of the data sets. Objects in more sparse areas are considered to be outliers or border points. This helps discover clusters of arbitrary shape.</p>
<h2>DBSCAN</h2>
<p>Given a set of points in space, it groups together points that are closely packed together while marking points that lie alone in low-density regions as outliers.</p>
<h3>Preliminary Information</h3>
<ul>
<li>A point $p$ is a core point if at least k (often referred to as minPts) are within $\epsilon$ of it. Those points are said to be <em>directly reachable</em> from $p$.</li>
<li>A point $q$ is directly reachable from $p$ if point $q$ is within distance $\epsilon$ from point $p$ and $p$ must be a core point</li>
<li>A point $q$ is reachable from $p$ if there is a path $p_1, \dots, p_n$ with $p_1 = p$ and $p<em>n = q$ where each $p</em>{i + 1}$ is directly reachable from $p_i$. (All points on the path must be core points, with the possible exception of $q$)</li>
<li>All points not reachable from any other points are outliers</li>
</ul>
<p>Non core points can be part of a cluster, but they form its &quot;edge&quot;, since they cannot be used to reach more points.</p>
<p>Reachability is not a symmetric relation since, by definition, no point may be reachable from a non-core point, regardless of distance. </p>
<p>Two points $p$ and $q$ are density-connected if there is a point $o$ such that both $p$ and $q$ are reachable from $o$. Density-connectedness is symmetric.</p>
<p>A cluster then satisfies two properties:</p>
<ol>
<li>All points within the cluster are mutually density-connected</li>
<li>If a point is density-reachable from any point of the cluster, it is part of the cluster as well.</li>
</ol>
<h3>Algorithm</h3>
<ol>
<li>Find the $\epsilon$ neighbors of every point, and identify the core points with more than $k$ neighbors.</li>
<li>Find the connected components of <em>core</em> points on the neighborhood graph, ignoring all non-core points.</li>
<li>Assign each non-core point to a nearby cluster if the cluster is an $\epsilon$ (eps) neighbor, otherwise assign it to noise.</li>
</ol>
<h3>Advantages</h3>
<ul>
<li>Does not require one to specify the number of clusters in the data</li>
<li>Can find arbitrarily shaped clusters</li>
<li>Has a notion of noise and is robust to outliers</li>
</ul>
<h3>Disadvantages</h3>
<ul>
<li>Not entirely deterministic: border points that are reachable from more than one cluster can be part of either cluster.</li>
<li>The quality to DBSCAN depends on the distance measure used.</li>
<li>Cannot cluster data sets well with large differences in densities.</li>
</ul>
<h3>Rule of Thumbs for parameters</h3>
<p>$k$: $k$ must be larger than $(D + 1)$ where $D$ is the number of dimensions in the dataset. Normally $k$ is chosen to be twice the number of dimensions.</p>
<p>$\epsilon$: Ideally the $k^{th}$ nearest neighbors are at roughly the same distance. Plot the sorted distance of every point to it's $k^{th}$ nearest neighbor</p>
<p>Example of Run Through</p>
<p><a href="https://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf">https://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,99 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Why use different distance measures?</h1>
<p>I made an attempt to find out in what situations people use different distance measures. Looking around in the Internet usually produces the results &quot;It depends on the problem&quot; or &quot;I typically just always use Euclidean&quot;</p>
<p>Which as you might imagine, isn't a terribly useful answer. Since it doesn't give me any examples of which types of problems different distances solve.</p>
<p>Therefore, let's think about it in a different way. What properties do different distance measures have that make them desirable?</p>
<h2>Manhattan Advantages</h2>
<ul>
<li>The gradient of this function has a constant magnitude. There's no power in the formula</li>
<li>Unusual values affect distances on Euclidean more since the difference is squared</li>
</ul>
<p><a href="https://datascience.stackexchange.com/questions/20075/when-would-one-use-manhattan-distance-as-opposite-to-euclidean-distance">https://datascience.stackexchange.com/questions/20075/when-would-one-use-manhattan-distance-as-opposite-to-euclidean-distance</a></p>
<h2>Mahalanobis Advantages</h2>
<p>Variables can be on different scales. The Mahalanobis formula has a built in variance-covariance matrix which allows you to rescale your variables to make distances of different variables more comparable.</p>
<p><a href="https://stats.stackexchange.com/questions/50949/why-use-the-mahalanobis-distance#50956">https://stats.stackexchange.com/questions/50949/why-use-the-mahalanobis-distance#50956</a></p>
<h2>Euclidean Disadvantages</h2>
<p>In higher dimensions, the points essentially become uniformly distant from one another. This is a problem observed in most distance metrics but it's more obvious with the Euclidean one.</p>
<p><a href="https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions/">https://stats.stackexchange.com/questions/99171/why-is-euclidean-distance-not-a-good-metric-in-high-dimensions/</a></p>
<p>Hopefully in this course, we'll discover more properties as to why it makes sense to use different distance measures since it can have a impact on how our clusters are formed.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,123 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Principal Component Analysis Pt. 1</h1>
<h2>What is PCA?</h2>
<p>Principal component analysis is a statistical procedure that performs an orthogonal transformation to convert a set of variables into a set of linearly uncorrelated variables called principle components.</p>
<p>Number of distinct principle components equals $min(# Variables, # Observations - 1)$</p>
<p>The transformation is defined in such a way that the first principle component has the largest possible variance explained in the data.</p>
<p>Each succeeding component has the highest possible variance under the constraint of having to be orthogonal to the preceding components.</p>
<p>PCA is sensitive to the relative scaling of the original variables.</p>
<h3>Results of a PCA</h3>
<p>Results are discussed in terms of <em>component scores</em> which is the transformed variables and <em>loadings</em> which is the weight by which each original variable should be multiplied to get the component score.</p>
<h2>Assumptions of PCA</h2>
<ol>
<li>Linearity</li>
<li>Large variances are important and small variances denote noise</li>
<li>Principal components are orthogonal</li>
</ol>
<h2>Why perform PCA?</h2>
<ul>
<li>Distance measures perform poorly in high-dimensional space (<a href="https://stats.stackexchange.com/questions/256172/why-always-doing-dimensionality-reduction-before-clustering">https://stats.stackexchange.com/questions/256172/why-always-doing-dimensionality-reduction-before-clustering</a>)</li>
<li>Helps eliminates noise from the dataset (<a href="https://www.quora.com/Does-it-make-sense-to-perform-principal-components-analysis-before-clustering-if-the-original-data-has-too-many-dimensions-Is-it-theoretically-unsound-to-try-to-cluster-data-with-no-correlation">https://www.quora.com/Does-it-make-sense-to-perform-principal-components-analysis-before-clustering-if-the-original-data-has-too-many-dimensions-Is-it-theoretically-unsound-to-try-to-cluster-data-with-no-correlation</a>)</li>
<li>One initial cost to help reduce further computations</li>
</ul>
<h2>Computing PCA</h2>
<ol>
<li>Subtract off the mean of each measurement type</li>
<li>Compute the covariance matrix</li>
<li>Take the eigenvalues/vectors of the covariance matrix</li>
</ol>
<h2>R Code</h2>
<pre><code class="language-R">pcal = function(data) {
centered_data = scale(data)
covariance = cov(centered_data)
eigen_stuff = eigen(covariance)
sorted_indices = sort(eigen_stuff$values,
index.return = T,
decreasing = T)$ix
loadings = eigen_stuff$values[sorted_indices]
components = eigen_stuff$vectors[sorted_indices,]
combined_list = list(loadings, components)
names(combined_list) = c("Loadings", "Components")
return(combined_list)
}</code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,96 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Revisiting Similarity Measures</h1>
<h2>Manhatten Distance</h2>
<p>An additional use case for Manhatten distance is when dealing with binary vectors. This approach, otherwise known as the Hamming distance, is the number of bits that are different between two binary vectors.</p>
<h2>Ordinal Variables</h2>
<p>Ordinal variables can be treated as if they were on a interval scale.</p>
<p>First, replace the ordinal variable value by its rank ($r_{if}$) Then map the range of each variable onto the interval $[0, 1]$ by replacing the $f<em>i$ where f is the variable and i is the object by
$$
z</em>{if} = \frac{r_{if} - 1}{M_f - 1}
$$
Where $M_f$ is the maximum rank.</p>
<h3>Example</h3>
<p>Freshman = $0$ Sophmore = $\frac{1}{3}$ Junior = $\frac{2}{3}$ Senior = $1$</p>
<p>$d(freshman, senior) = 1$</p>
<p>$d(junior, senior) = \frac{1}{3}$</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,103 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Cluster Tendency</h1>
<p>This is the assessment of the suitability of clustering. Cluster Tendency determines whether the data has any inherent grouping structure.</p>
<p>This is a hard task since there are so many different definitions of clusters (portioning, hierarchical, density, graph, etc.) Even after fixing a cluster type, this is still hard in defining an appropriate null model for a data set.</p>
<p>One way we can go about measuring cluster tendency is to compare the data against random data. On average, random data should not contain clusters.</p>
<p>There are some clusterability assessment methods such as Spatial histogram, distance distribution and Hopkins statistic.</p>
<h2>Hopkins Statistic</h2>
<p>Let $X$ be the set of $n$ data points in $d$ dimensional space. Consider a random sample (without replacement) of $m &lt;&lt; n$ data points. Also generate a set $Y$ of $m$ uniformly randomly distributed data points.</p>
<p>Now define two distance measures $u_i$ to be the distance of $y_i \in Y$ from its nearest neighbor in X and $w_i$ to be the distance of $x_i \in X$ from its nearest neighbor in X</p>
<p>We can then define Hopkins statistic as
$$
H = \frac{\sum_{i = 1}^m{u<em>i^d}}{\sum</em>{i = 1}^m{u<em>i^d} + \sum</em>{i =1}^m{w_i^d}}
$$</p>
<h3>Properties</h3>
<p>With this definition, uniform random data should tend to have values near 0.5, and clustered data should tend to have values nearer to 1.</p>
<h3>Drawbacks</h3>
<p>However, data containing a single Gaussian will also score close to one. As this statistic measures deviation from a uniform distribution. Making this statistic less useful in application as real data is usually not remotely uniform.</p>
<h2>Spatial Histogram Approach</h2>
<p>For this method, I'm not too sure how this works, but here are some key points I found.</p>
<p>Divide each dimension in equal width bins, and count how many points lie in each of the bins and obtain the empirical joint probability mass function.</p>
<p>Do the same for the randomly sampled data</p>
<p>Finally compute how much they differ using the Kullback-Leibler (KL) divergence value. If it differs greatly than we can say that the data is clusterable.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,217 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Principal Component Analysis Part 2: Formal Theory</h1>
<h2>Properties of PCA</h2>
<p>There are a number of ways to maximize the variance of a principal component. To create an unique solution we should impose a constraint. Let us say that the sum of the square of the coefficients must equal 1. In vector notation this is the same as
$$
a_i^Ta_i = 1
$$
Every future principal component is said to be orthogonal to all the principal components previous to it.
$$
a_j^Ta<em>i = 0, i &lt; j
$$
The total variance of the $q$ principal components will equal the total variance of the original variables
$$
\sum</em>{i = 1}^q {\lambda_i} = trace(S)
$$
Where $S$ is the sample covariance matrix.</p>
<p>The proportion of accounted variation in each principle component is
$$
P_j = \frac{\lambda<em>j}{trace(S)}
$$
From this, we can generalize to the first $m$ principal components where $m &lt; q$ and find the proportion $P^{(m)}$ of variation accounted for
$$
P^{(m)} = \frac{\sum</em>{i = 1}^m{\lambda_i}}{trace(S)}
$$
You can think of the first principal component as the line of best fit that minimizes the residuals orthogonal to it.</p>
<h3>What to watch out for</h3>
<p>As a reminder to the last lecture, <em>PCA is not scale-invariant</em>. Therefore, transformations done to the dataset before PCA and after PCA often lead to different results and possibly conclusions.</p>
<p>Additionally, if there are large differences between the variances of the original variables, then those whose variances are largest will tend to dominate the early components.</p>
<p>Therefore, principal components should only be extracted from the sample covariance matrix when all of the original variables have roughly the <strong>same scale</strong>.</p>
<h3>Alternatives to using the Covariance Matrix</h3>
<p>But it is rare in practice to have a scenario when all of the variables are of the same scale. Therefore, principal components are typically extracted from the <strong>correlation matrix</strong> $R$</p>
<p>Choosing to work with the correlation matrix rather than the covariance matrix treats the variables as all equally important when performing PCA.</p>
<h2>Example Derivation: Bivariate Data</h2>
<p>Let $R$ be the correlation matrix
$$
R = \begin{pmatrix}
1 &amp; r \
r &amp; 1
\end{pmatrix}
$$
Let us find the eigenvectors and eigenvalues of the correlation matrix
$$
det(R - \lambda I) = 0
$$</p>
<p>$$
(1-\lambda)^2 - r^2 = 0
$$</p>
<p>$$
\lambda_1 = 1 + r, \lambda_2 = 1 - r
$$</p>
<p>Let us remember to check the condition &quot;sum of the principal components equals the trace of the correlation matrix&quot;:
$$
\lambda_1 + \lambda_2 = 1+r + (1 - r) = 2 = trace(R)
$$</p>
<h3>Finding the First Eigenvector</h3>
<p>Looking back at the characteristic equation
$$
Ra_1 = \lambda a<em>1
$$
We can get the following two formulas
$$
a</em>{11} + ra<em>{12} = (1+r)a</em>{11} \tag{1}
$$</p>
<p>$$
ra<em>{11} + a</em>{12} = (1 + r)a_{12} \tag{2}
$$</p>
<p>Now let us find out what $a<em>{11}$ and $a</em>{12}$ equal. First let us solve for $a<em>{11}$ using equation $(1)$
$$
ra</em>{12} = (1+r)a<em>{11} - a</em>{11}
$$</p>
<p>$$
ra<em>{12} = a</em>{11}(1 + r - 1)
$$</p>
<p>$$
ra<em>{12} = ra</em>{11}
$$</p>
<p>$$
a<em>{12} = a</em>{11}
$$</p>
<p>Where $r$ does not equal $0$.</p>
<p>Now we must apply the condition of sum squares
$$
a_1^Ta_1 = 1
$$</p>
<p>$$
a<em>{11}^2 + a</em>{12}^2 = 1
$$</p>
<p>Recall that $a<em>{12} = a</em>{11}$
$$
2a_{11}^2 = 1
$$</p>
<p>$$
a_{11}^2 = \frac{1}{2}
$$</p>
<p>$$
a_{11} =\pm \frac{1}{\sqrt{2}}
$$</p>
<p>For sake of choosing a value, let us take the principal root and say $a_{11} = \frac{1}{\sqrt{2}}$</p>
<h3>Finding the Second Eigenvector</h3>
<p>Recall the fact that each subsequent eigenvector is orthogonal to the first. This means
$$
a<em>{11}a</em>{21} + a<em>{12}a</em>{22} = 0
$$
Substituting the values for $a<em>{11}$ and $a</em>{12}$ calculated in the previous section
$$
\frac{1}{\sqrt{2}}a<em>{21} + \frac{1}{\sqrt{2}}a</em>{22} = 0
$$</p>
<p>$$
a<em>{21} + a</em>{22} = 0
$$</p>
<p>$$
a<em>{21} = -a</em>{22}
$$</p>
<p>Since this eigenvector also needs to satisfy the first condition, we get the following values
$$
a<em>{21} = \frac{1}{\sqrt{2}} , a</em>{22} = \frac{-1}{\sqrt{2}}
$$</p>
<h3>Conclusion of Example</h3>
<p>From this, we can say that the first principal components are given by
$$
y_1 = \frac{1}{\sqrt{2}}(x_1 + x_2), y_2 = \frac{1}{\sqrt{2}}(x_1-x_2)
$$
With the variance of the first principal component being given by $(1+r)$ and the second by $(1-r)$</p>
<p>Due to this, as $r$ increases, so does the variance explained in the first principal component. This in turn, lowers the variance explained in the second principal component.</p>
<h2>Choosing a Number of Principal Components</h2>
<p>Principal Component Analysis is typically used in dimensionality reduction efforts. Therefore, there are several strategies for picking the right number of principal components to keep. Here are a few:</p>
<ul>
<li>Retain enough principal components to account for 70%-90% of the variation</li>
<li>Exclude principal components where eigenvalues are less than the average eigenvalue</li>
<li>Exclude principal components where eigenvalues are less than one.</li>
<li>Generate a Scree Plot
<ul>
<li>Stop when the plot goes from &quot;steep&quot; to &quot;shallow&quot;</li>
<li>Stop when it essentially becomes a straight line.</li>
</ul></li>
</ul>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,100 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Introduction to Connectivity Based Models</h1>
<p>Hierarchical algorithms combine observations to form clusters based on their distance.</p>
<h2>Connectivity Methods</h2>
<p>Hierarchal Clustering techniques can be subdivided depending on the method of going about it.</p>
<p>First there are two different methods in forming the clusters <em>Agglomerative</em> and <em>Divisive</em></p>
<p><u>Agglomerative</u> is when you combine the n individuals into groups through each iteration</p>
<p><u>Divisive</u> is when you are separating one giant group into finer groupings with each iteration.</p>
<p>Hierarchical methods are an irrevocable algorithm, once it joins or separates a grouping, it cannot be undone. As Kaufman and Rousseeuw (1990) colorfully comment: <em>&quot;A hierarchical method suffers from the defect that it can never repair what was done in previous steps&quot;</em>. </p>
<p>It is the job of the statistician to decide when to stop the agglomerative or decisive algorithm, since having one giant cluster containing all observations or having each observation be a cluster isn't particularly useful.</p>
<p>At different distances, different clusters are formed and are more readily represented using a <strong>dendrogram</strong>. These algorithms do not provide a unique solution but rather provide an extensive hierarchy of clusters that merge or divide at different distances.</p>
<h2>Linkage Criterion</h2>
<p>Apart from the method of forming clusters, the user also needs to decide on a linkage criterion to use. Meaning, how do you want to optimize your clusters.</p>
<p>Do you want to group based on the nearest points in each cluster? Nearest Neighbor Clustering</p>
<p>Or do you want to based on the farthest observations in each cluster? Farthest neighbor clustering.</p>
<p><img src="http://www.multid.se/genex/onlinehelp/clustering_distances.png" alt="http://www.multid.se/genex/onlinehelp/clustering_distances.png" /></p>
<h2>Shortcomings</h2>
<p>This method is not very robust towards outliers, which will either show up as additional clusters or even cause other clusters to merge depending on the clustering method.</p>
<p>As we go through this section, we will go into detail about the different linkage criterion and other parameters of this model.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,155 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Agglomerative Methods</h1>
<h2>Single Linkage</h2>
<p>First let us consider the single linkage (nearest neighbor) approach. The clusters can be found through the following algorithm</p>
<ol>
<li>Find the smallest non-zero distance</li>
<li>Group the two objects together as a cluster</li>
<li>Recompute the distances in the matrix by taking the minimum distances
<ul>
<li>Cluster a,b -&gt; c = min(d(a, c), d(b, c))</li>
</ul></li>
</ol>
<p>Single linkage can operate directly on a proximity matrix, the actual data is not required.</p>
<p>A wonderful visual representation can be found in Everitt Section 4.2</p>
<h2>Centroid Clustering</h2>
<p>This is another criterion measure that requires both the data and proximity matrix. These are the following steps of the algorithm. Requires Euclidean distance measure to preserve geometric correctness</p>
<ol>
<li>Find the smallest non-zero distance</li>
<li>Group the two objects together as a cluster</li>
<li>Recompute the distances by taking the mean of the clustered observations and computing the distances between all of the observations
<ul>
<li>Cluster a,b -&gt; c = d(mean(a, b), c)</li>
</ul></li>
</ol>
<h2>Complete Linkage</h2>
<p>This is like Single Linkage, except now we're taking the farthest distance. The algorithm can be adjusted to the following</p>
<ol>
<li>Find the smallest non-zero distance</li>
<li>Group the two objects together as a cluster</li>
<li>Recompute the distances in the matrix by taking the maximum distances
<ul>
<li>Cluster a,b -&gt; c = max(d(a, c), d(b, c))</li>
</ul></li>
</ol>
<h2>Unweighted Pair-Group Method using the Average Approach (UPGMA)</h2>
<p>In this criterion, we are no longer summarizing each cluster before taking distances, but instead comparing each observation in the cluster to the outside point and taking the average</p>
<ol>
<li>Find the smallest non-zero distance</li>
<li>Group the two objects together as a cluster</li>
<li>Recompute the distances in the matrix by taking the mean
<ul>
<li>Cluster A: a,b -&gt; c = $mean_{i = 0}(d(A_i, c))$</li>
</ul></li>
</ol>
<h2>Median Linkage</h2>
<p>This approach is similar to the UPGMA approach, except now we're taking the median instead of the mean</p>
<ol>
<li>Find the smallest non-zero distance</li>
<li>Group the two objects together as a cluster</li>
<li>Recompute the distances in the matrix by taking the median
<ul>
<li>Cluster A: a,b -&gt; c = $median_{i = 0}{(d(A_i, c))}$</li>
</ul></li>
</ol>
<h2>Ward Linkage</h2>
<p>This one I didn't look too far into but here's the description: With Ward's linkage method, the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's linkage is to minimize the within-cluster sum of squares.</p>
<h2>When to use different Linkage Types?</h2>
<p>According to the following two stack overflow posts: <a href="https://stats.stackexchange.com/questions/195446/choosing-the-right-linkage-method-for-hierarchical-clustering">https://stats.stackexchange.com/questions/195446/choosing-the-right-linkage-method-for-hierarchical-clustering</a> and <a href="https://stats.stackexchange.com/questions/195456/how-to-select-a-clustering-method-how-to-validate-a-cluster-solution-to-warran/195481#195481">https://stats.stackexchange.com/questions/195456/how-to-select-a-clustering-method-how-to-validate-a-cluster-solution-to-warran/195481#195481</a></p>
<p>These are the following ways you can justify a linkage type.</p>
<p><strong>Cluster metaphor</strong>. <em>&quot;I preferred this method because it constitutes clusters such (or such a way) which meets with my concept of a cluster in my particular project.&quot;</em></p>
<p><strong>Data/method assumptions</strong>. <em>&quot;I preferred this method because my data nature or format predispose to it.&quot;</em></p>
<p><strong>Internal validity</strong>. <em>&quot;I preferred this method because it gave me most clear-cut, tight-and-isolated clusters.&quot;</em> </p>
<p><strong>External validity</strong>. <em>&quot;I preferred this method because it gave me clusters which differ by their background or clusters which match with the true ones I know.&quot;</em></p>
<p><strong>Cross-validity</strong>. <em>&quot;I preferred this method because it is giving me very similar clusters on equivalent samples of the data or extrapolates well onto such samples.&quot;</em></p>
<p><strong>Interpretation</strong>. <em>&quot;I preferred this method because it gave me clusters which, explained, are most persuasive that there is meaning in the world.&quot;</em></p>
<h3>Cluster Metaphors</h3>
<p>Let us explore the idea of cluster metaphors now.</p>
<p><strong>Single Linkage</strong> or <strong>Nearest Neighbor</strong> is a <em>spectrum</em> or <em>chain</em>.</p>
<p>Since single linkage joins clusters by the shortest link between them, the technique cannot discern poorly separated clusters. On the other hand, single linkage is one of the few clustering methods that can delineate nonelipsodial clusters.</p>
<p><strong>Complete Linkage</strong> or <strong>Farthest Neighbor</strong> is a <em>circle</em>.</p>
<p><strong>Between-Group Average linkage</strong> (UPGMA) is a united *class</p>
<p><strong>Centroid method</strong> (UPGMC) is <em>proximity of platforms</em> (commonly used in politics)</p>
<h2>Dendrograms</h2>
<p>A <strong>dendrogram</strong> is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. It shows how different clusters are formed at different distance groupings.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,149 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Divisive Methods Pt.1</h1>
<p>Divisive methods work in the opposite direction of agglomerative methods. They take one large cluster and successively splits it.</p>
<p>This is computationally demanding if all $2^{k - 1} - 1$ possible divisions into two sub-clusters of a cluster of $k$ objects are considered at each stage.</p>
<p>While less common than Agglomerative methods, divisive methods have the advantage that most users are interested in the main structure in their data, and this is revealed from the outset of a divisive method.</p>
<h2>Monothetic Divisive Methods</h2>
<p>For data consisting of $p$ <strong>binary variables</strong>, there is a computationally efficient method known as <em>monothetic divisive methods</em> available.</p>
<p>Monothetic divisions divide clusters according to the presence or absence of each of the $p$ variables, so that at each stage, clusters contain members with certain attributes that are either all present or all absent.</p>
<p>The term 'monothetic' refers to the use of a single variable on which to base the split on. <em>Polythetic</em> methods use all the variables at each stage.</p>
<h3>Choosing the Variable to Split On</h3>
<p>The choice of the variable on which a split is made depends on optimizing a criterion reflecting either cluster homogeneity or association with other variables.</p>
<p>This tends to minimize the number of splits that have to be made.</p>
<p>An example of an homogeneity criterion is the information content $C$</p>
<p>This is defined with $p$ variables and $n$ objections where $f<em>k$ is the number of individuals with the $k$ attribute
$$
C = pn\log{n}-\sum</em>{k = 1}^p{(f_k\log{f_k} - (n-f_k)\log{(n-f_k)})}
$$</p>
<h3>Association with other variables</h3>
<p>Recall that another way to split is based on the association with other variables. The attribute used at each step can be chosen according to its overall association with all attributes remaining at each step.</p>
<p>This is sometimes termed <em>association analysis</em>.</p>
<table>
<thead>
<tr>
<th></th>
<th>V1</th>
<th>V2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>a</td>
<td>b</td>
</tr>
<tr>
<td>0</td>
<td>c</td>
<td>d</td>
</tr>
</tbody>
</table>
<h4>Common measures of association</h4>
<p>$$
|ad-bc| \tag{4.6}
$$</p>
<p>$$
(ad-bc)^2 \tag{4.7}
$$</p>
<p>$$
\frac{(ad-bc)^2n}{(a+b)(a+c)(b+d)(c+d)} \tag{4.8}
$$</p>
<p>$$
\sqrt{\frac{(ad-bc)^2n}{(a+b)(a+c)(b+d)(c+d)}} \tag{4.9}
$$</p>
<p>$$
\frac{(ad-bc)^2}{(a+b)(a+c)(b+d)(c+d)} \tag{4.10}
$$</p>
<p>$(4.6)$ and $(4.7)$ have the advantage that there is no danger of computational problems if any marginal totals are near zero.</p>
<p>The last three, $(4.8)$, $(4.9)$, $(4.10)$, are all related to the $\chi^2$ statistic, its square root, and the Pearson correlation coefficient respectively.</p>
<h3>Advantages/Disadvantages of Monothetic Methods</h3>
<p>Appealing features of monothetic divisive methods are the easy classification of new members and the including of cases with missing values.</p>
<p>A further advantage of monothetic divisive methods is that it is obvious which variables produce the split at any stage of the process.</p>
<p>A disadvantage with these methods is that the possession of a particular attribute which is either rare or rarely found in combination with others may take an individual down a different path.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,120 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Divisive Methods Pt 2.</h1>
<p>Recall in the previous section that we spoke about Monothetic and Polythetic methods. Monothetic methods only looks at a single variable at a time while Polythetic looks at multiple variables simultaneously. In this section, we will speak more about polythetic divisive methods.</p>
<h2>Polythetic Divisive Methods</h2>
<p>Polythetic methods operate via a distance matrix.</p>
<p>This procedure avoids considering all possible splits by </p>
<ol>
<li>Finding the object that is furthest away from the others within a group and using that as a seed for a splinter group.</li>
<li>Each object is then considered for entry to that separate splinter group: any that are closer to the splinter group than the main group is moved to the splinter one. </li>
<li>The step is then repeated.</li>
</ol>
<p>This process has been developed into a program named <code>DIANA</code> (DIvisive ANAlysis Clustering) which is implemented in <code>R</code>.</p>
<h3>Similarities to Politics</h3>
<p>This somewhat resembles a way a political party might split due to inner conflicts.</p>
<p>Firstly, the most discontented member leaves the party and starts a new one, and then some others follow him until a kind of equilibrium is attained.</p>
<h2>Methods for Large Data Sets</h2>
<p>There are two common hierarchical methods used for large data sets <code>BIRCH</code> and <code>CURE</code>. Both of these algorithms employ a pre-clustering phase in where dense regions are summarized, the summaries being then clustered using a hierarchical method based on centroids.</p>
<h3>CURE</h3>
<ol>
<li><code>CURE</code> starts with a random sample of points and represents clusters by a smaller number of points that capture the shape of the cluster</li>
<li>Which are then shrunk towards the centroid as to dampen the effect of the outliers</li>
<li>Hierarchical clustering then operates on the representative points</li>
</ol>
<p><code>CURE</code> has been shown to be able to cope with arbitrary-shaped clusters and in that respect may be superior to <code>BIRCH</code>, although it does require judgment as to the number of clusters and also a parameter which favors either more or less compact clusters.</p>
<h2>Revisiting Topics: Cluster Dissimilarity</h2>
<p>In order to decide where clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required.</p>
<p>In most methods of hierarchical clustering this is achieved by a use of an appropriate </p>
<ul>
<li>Metric (a measure of distance between pairs of observations)</li>
<li>Linkage Criterion (which specifies the dissimilarities of sets as functions of pairwise distances observations in the sets)</li>
</ul>
<h2>Advantages of Hierarchical Clustering</h2>
<ul>
<li>Any valid measure of distance measure can be used</li>
<li>In most cases, the observations themselves are not required, just hte matrix of distances
<ul>
<li>This can have the advantage of only having to store a distance matrix in memory as opposed to a n-dimensional matrix.</li>
</ul></li>
</ul>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,131 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>CURE and TSNE</h1>
<h2>Clustering Using Representatives (CURE)</h2>
<p>Clustering using Representatives is a Hierarchical clustering technique in which you can represent a cluster using a <strong>set</strong> of well-scattered representative points.</p>
<p>This algorithm has a parameter $\alpha$ which defines the factor of the points in which to shrink towards the centroid.</p>
<p>CURE is known to be robust to outliers and able to identify clusters that have a <strong>non-spherical</strong> shape and size variance.</p>
<p>The clusters with the closest pair of representatives are the clusters that are merged at each step of CURE's algorithm.</p>
<p>This algorithm cannot be directly applied to large datasets due to high runtime complexity. Several enhancements were added to address this requirement</p>
<ul>
<li>Random sampling: This involves a trade off between accuracy and efficiency. One would hope that the random sample they obtain is representative of the population</li>
<li>Partitioning: The idea is to partition the sample space into $p$ partitions</li>
</ul>
<p>Youtube Video: <a href="https://www.youtube.com/watch?v=JrOJspZ1CUw">https://www.youtube.com/watch?v=JrOJspZ1CUw</a></p>
<p>Steps</p>
<ol>
<li>Pick a random sample of points that fit in main memory</li>
<li>Cluster sample points hierarchically to create the initial clusters</li>
<li>Pick representative point<strong>s</strong>
<ol>
<li>For each cluster, pick $k$ representative points, as dispersed as possible</li>
<li>Move each representative points to a fixed fraction $\alpha$ toward the centroid of the cluster</li>
</ol></li>
<li>Rescan the whole dataset and visit each point $p$ in the data set</li>
<li>Place it in the &quot;closest cluster&quot;
<ol>
<li>Closest as in shortest distance among all the representative points.</li>
</ol></li>
</ol>
<h2>TSNE</h2>
<p>TSNE allows us to reduce the dimensionality of a dataset to two which allows us to visualize the data.</p>
<p>It is able to do this since many real-world datasets have a low intrinsic dimensionality embedded within the high-dimensional space. </p>
<p>Since the technique needs to conserve the structure of the data, two corresponding mapped points must be close to each other distance wise as well. Let $|x_i - x_j|$ be the Euclidean distance between two data points, and $|y_i - y<em>j|$ he distance between the map points. This conditional similarity between two data points is:
$$
p</em>{j|i} = \frac{exp(-|x_i-x_j|^2 / (2\sigma<em>i^2))}{\sum</em>{k \ne i}{exp(-|x_i-x_k|^2/(2\sigma_i^2))}}
$$
Where we are considering the <strong>Gaussian distribution</strong> surrounding the distance between $x_j$ from $x_i$ with a given variance $\sigma_i^2$. The variance is different for every point; it is chosen such that points in dense areas are given a smaller variance than points in sparse areas.</p>
<p>Now the similarity matrix for mapped points are
$$
q_{ij} = \frac{f(|x_i - x<em>j|)}{\sum</em>{k \ne i}{f(|x_i - x_k)}}
$$
Where $f(z) = \frac{1}{1 + z^2}$</p>
<p>This has the same idea as the conditional similarity between two data points, except this is based on the <strong>Cauchy distribution</strong>.</p>
<p>TSNE works at minimizing the Kullback-Leiber divergence between the two distributions $p<em>{ij}$ and $q</em>{ij}$
$$
KL(P || Q) = \sum<em>{i,j}{p</em>{i,j} \log{\frac{p<em>{ij}}{q</em>{ij}}}}
$$
To minimize this score, gradient descent is typically performed
$$
\frac{\partial KL(P||Q)}{\partial y_i} = 4\sum<em>j{(p</em>{ij} - q_{ij})}
$$</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,133 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Cluster Validation</h1>
<p>There are multiple approaches to validating your cluster models</p>
<ul>
<li>Internal Evaluation: This is when you summarize the clustering into a single score. For example, minimizing the the deviations from the centroids.</li>
<li>External Evaluation: Minimizing the deviations from some known &quot;labels&quot;</li>
<li>Manual Evaluation: A human expert decides whether or not it's good</li>
<li>Indirect Evaluation: Evaluating the utility of clustering in its intended application.</li>
</ul>
<h2>Some Problems With These Evaluations</h2>
<p>Internal evaluation measures suffer form the problem that they represent functions that are objectives for many clustering algorithms. So of course the result of the clustering algorithm will be such that the objective would be minimized.</p>
<p>External evaluation suffers from the fact that if we had labels to begin with then we wouldn't need to cluster. Practical applications of clustering occur usually when we don't have labels. On the other hand, possible labeling can reflect one possible partitioning of the data set. There could exist different, perhaps even better clustering.</p>
<h2>Internal Evaluation</h2>
<p>We like to see a few qualities in cluster models</p>
<ul>
<li><em>Robustness</em>: Refers to the effects of errors in data or missing observations, and changes in the data and methods.</li>
<li><em>Cohesiveness</em>: Clusters should be compact or high high intra-cluster similarity.</li>
<li>Clusters should be dissimilar to separate clusters. Should have low inter-cluster similarity</li>
<li><em>Influence</em>: We should pay attention to and try to control for the influence of certain observations on the overall cluster</li>
</ul>
<p>Let us focus on the second and third bullet point for now. Internal evaluation measures are best suited to get some insight into situations where one algorithm performs better than another, this does not imply that one algorithm produces more valid results than another.</p>
<h3>Davies-Bouldin Index</h3>
<p>$$
DB = \frac{1}{n}\sum<em>{i=1}^n{max</em>{j\ne i}{(\frac{\sigma_i + \sigma_j}{d(c_i,c_j)})}}
$$</p>
<p>Where $n$ is the number of clusters, $c$ indicates a centroid, and $\sigma$ represents the deviation from the centroid.</p>
<p>Better clustering algorithms are indicated by smaller DB values.</p>
<h3>Dunn Index</h3>
<p>$$
D= \frac{min<em>{1\le i &lt; j \le n}{d(i,j)}}{max</em>{1\le k \le n}{d^\prime(k)}}
$$</p>
<p>The Dunn index aims to identify dense and well-separated clusters. This is defined as the ratio between the minimal inter-cluster distance to maximal intra-cluster distance.</p>
<p>High Dunn Index values are more desirable.</p>
<h3>Bootstrapping</h3>
<p>In terms of robustness we can measure uncertainty in each of the individual clusters. This can be examined using a bootstrapping approach by Suzuki and Shimodaira (2006). The probability or &quot;p-value&quot; is the proportion of bootstrapped samples that contain the cluster. Larger p-values in this case indicate more support for the cluster.</p>
<p>This is available in R via <code>Pvclust</code></p>
<h3>Split-Sample Validation</h3>
<p>One approach to assess the effects of perturbations of the data is by randomly dividing the data into two subsets and performing an analysis on each subset separately. This method was proposed by McIntyre and Blashfield in 1980; their method involves the following steps</p>
<ul>
<li>Divide the sample in two and perform a cluster analysis on one of the samples
<ul>
<li>Have a fixed rule for the number of clusters</li>
</ul></li>
<li>Determine the centroids of the clusters, and compute proximities between the objects in teh second sample and the centroids, classifying the objects into their nearest cluster.</li>
<li>Cluster the second sample using the same methods as before and compare these two alternate clusterings using something like the <em>adjusted Rand index</em>.</li>
</ul>
<p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b1850490e5209123ab6e5b905495b4d5f9a1f661" alt="Adjusted Index" /></p>
<h2>Influence of Individual Points</h2>
<p>Using internal evaluation metrics, you can see the impact of each point by doing a &quot;leave one out&quot; analysis. Here you evaluate the dataset minus one point for each of the points. If a positive difference is found, the point is regarded as a <em>facilitator</em>, whereas if it is negative then it is considered an <em>inhibitor</em>. once an influential inhibitor is found, the suggestion is to normally omit it from the clustering.</p>
<h2>R Package</h2>
<p><code>clValid</code> contains a variety of internal validation measures.</p>
<p>Paper: <a href="https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf">https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,103 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture Notes for Cluster Analysis</h1>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec1.html">Lecture 1: Measures of Similarity</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec2-1.html">Lecture 2.1: Distance Measures Reasoning</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec2-2.html">Lecture 2.2: Principle Component Analysis Pt. 1</a></p>
<p>Lecture 3: Discussion of Dataset</p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec4.html">Lecture 4: Principal Component Analysis Pt. 2</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec4-2.html">Lecture 4.2: Revisiting Measures</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec4-3.html">Lecture 4.3: Cluster Tendency</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec5.html">Lecture 5: Introduction to Connectivity Based Models</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec6.html">Lecture 6: Agglomerative Methods</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec7.html">Lecture 7: Divisive Methods Part 1: Monothetic</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec8.html">Lecture 8: Divisive Methods Part 2: Polythetic</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec9-1.html">Lecture 9.1: CURE and TSNE</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec9-2.html">Lecture 9.2: Cluster Validation Part I</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec10-1.html">Lecture 10.1: Silhouette Coefficient</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec10-2.html">Lecture 10.2: Centroid-Based Clustering</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec10-3.html">Lecture 10.3: Voronoi Diagrams</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec11-1.html">Lecture 11.1: K-means++</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec11-2.html">Lecture 11.2: K-medoids</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec11-3.html">Lecture 11.3: K-medians</a></p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes%252Flec12.html">Lecture 12: Introduction to Density Based Clustering</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,104 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Readings for Lectures of Cluster Analysis</h1>
<h2>Lecture 1</h2>
<p>Garson Textbook Chapter 3</p>
<h2>Lecture 2</h2>
<p><a href="https://arxiv.org/pdf/1404.1100.pdf">A Tutorial on Principal Component Analysis</a></p>
<h2>Lecture 3</h2>
<p>No Readings</p>
<h2>Lecture 4</h2>
<p>An Introduction to Applied Multivariate Analysis with R by Brian Evveritt and Torsten Horthorn.</p>
<p>Sections 3.0-3.9 Everitt</p>
<h2>Lecture 5</h2>
<p>Section 4.1 Everitt</p>
<h2>Lecture 6</h2>
<p>Section 4.2 Everitt</p>
<p>Applied Multivariate Statistical Analysis Johnson Section 12.3</p>
<p><a href="https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/multivariate/how-to/cluster-observations/methods-and-formulas/linkage-methods/#mcquitty">Linkage Methods for Cluster Observations</a></p>
<h2>Lecture 7</h2>
<p>Section 4.3 Everitt</p>
<h2>Lecture 8</h2>
<p><a href="https://www.oreilly.com/learning/an-illustrated-introduction-to-the-t-sne-algorithm">Introduction to the TSNE Algorithm</a></p>
<h2>Lecture 9</h2>
<p>Section 9.5 Everitt</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,193 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Cluster Analysis Spring 2018</h1>
<h3>Distance, Dimensionality Reduction, and Tendency</h3>
<ul>
<li>Distance
<ul>
<li>Euclidean Distance</li>
<li>Squared Euclidean Distance</li>
<li>Manhattan Distance</li>
<li>Maximum Distance</li>
<li>Mahalanobis Distance</li>
</ul></li>
<li>Which distance function should you use?</li>
<li>PCA</li>
<li>Cluster Tendency
<ul>
<li>Hopkins Statistic</li>
</ul></li>
<li>Scaling Data</li>
</ul>
<h3>Validating Clustering Models</h3>
<ul>
<li>Clustering Validation</li>
<li>Cross Validation</li>
</ul>
<h3>Connectivity Models</h3>
<ul>
<li>Agglomerative Clustering
<ul>
<li>Single Linkage Clustering</li>
<li>Complete Linkage Clustering</li>
<li>Unweighted Pair Group Method with Arithmetic Mean (If time permits)</li>
</ul></li>
<li>Dendrograms</li>
<li>Divisive Clustering</li>
<li>CURE (Clustering using REpresentatives) algorithm (If time permits)</li>
</ul>
<h3>Cluster Evaluation</h3>
<ul>
<li>Internal Evaluation
<ul>
<li>Dunn Index</li>
<li>Silhouette Coefficient</li>
<li>Davies-Bouldin Index (If time permits)</li>
</ul></li>
<li>External Evaluation
<ul>
<li>Rand Measure</li>
<li>Jaccard Index</li>
<li>Dice Index</li>
<li>Confusion Matrix</li>
<li>F Measure (If time permits)</li>
<li>Fowlkes-Mallows Index (If time permits)</li>
</ul></li>
</ul>
<h3>Centroid Models</h3>
<ul>
<li>Jenks Natural Breaks Optimization</li>
<li>Voronoi Diagram</li>
<li>K means clustering</li>
<li>K medoids clustering</li>
<li>K Medians/Modes clustering</li>
<li>When to use K means as opposed to K medoids or K Medians?</li>
<li>How many clusters should you use?</li>
<li>Lloyd's Algorithm for Approximating K-means (If time permits)</li>
</ul>
<h3>Density Models</h3>
<ul>
<li>DBSCAN Density Based Clustering Algorithm</li>
<li>OPTICS Ordering Points To Identify the Clustering Structure</li>
<li>DeLi-Clu Density Link Clustering (If time permits)</li>
<li>What should be your density threshold?</li>
</ul>
<h3>Analysis of Model Appropriateness</h3>
<ul>
<li>When do we use each of the models above?</li>
</ul>
<h3>Distribution Models (If time permits)</h3>
<ul>
<li>Fuzzy Clusters</li>
<li>EM (Expectation Maximization) Clustering</li>
<li>Maximum Likelihood Gaussian</li>
<li>Probabilistic Hierarchal Clustering</li>
</ul>
<h2>Textbooks</h2>
<p>Cluster Analysis 5th Edition</p>
<ul>
<li>Authors: Brian S. Everitt, Sabine Landau, Morven Leese, Daniel Stahl</li>
<li>ISBN-13: 978-0470749913</li>
<li>Cost: Free on UMW Library Site</li>
<li>Amazon Link: <a href="https://www.amazon.com/Cluster-Analysis-Brian-S-Everitt/dp/0470749911/ref=sr_1_1?ie=UTF8&qid=1509135983&sr=8-1">https://www.amazon.com/Cluster-Analysis-Brian-S-Everitt/dp/0470749911/ref=sr_1_1?ie=UTF8&qid=1509135983&sr=8-1</a></li>
<li>Table of Contents: <a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002266.html">http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002266.html</a></li>
</ul>
<p>Cluster Analysis: 2014 Edition (Statistical Associates Blue Book Series 24) </p>
<ul>
<li>Author: David Garson</li>
<li>ISBN: 978-1-62638-030-1</li>
<li>Cost: Free with Site Registration</li>
<li>Website: <a href="http://www.statisticalassociates.com/clusteranalysis.htm">http://www.statisticalassociates.com/clusteranalysis.htm</a></li>
</ul>
<h2>Schedule</h2>
<p>In an ideal world, the topics below I estimated being a certain time period for learning them. Of course you have more experience when it comes to how long it actually takes to learn these topics, so I'll leave this mostly to your discretion.</p>
<p><strong>Distance, Dimensionality Reduction, and Tendency</strong> -- 3 Weeks</p>
<p><strong>Validating Cluster Models</strong> -- 1 Week</p>
<p><strong>Connectivity Models</strong> -- 2 Weeks</p>
<p><strong>Cluster Evaluation</strong> -- 1 Week</p>
<p><strong>Centroid Models</strong> -- 3 Weeks</p>
<p><strong>Density Models</strong> -- 3 Weeks</p>
<p><strong>Analysis of Model Appropriateness</strong> -- 1 Week</p>
<p>The schedule above accounts for 14 weeks, so there is a week that is free as a buffer.</p>
<h2>Conclusion</h2>
<p>Creating this document got me really excited for this independent study. Feel free to give me feedback :)</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,92 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Cluster Analysis</h1>
<p>Cluster Analysis is the art of finding inherent structures in data to form groups of similar observations. This has a myriad of applications from recommendation engines to social network analysis.</p>
<p>This is an independent study, meaning that I will be studying this topic under the direction of a professor, in this case being Dr. Denhere.</p>
<p>I have provided a list of topics that I wish to explore in a <a href="index.html%3Fresearch%252FClusterAnalysis%252Fsyllabus.html">syllabus</a></p>
<p>Dr. Denhere likes to approach independent studies from a theoretical and applied sense. Meaning, I will learn the theory of the different algorithms, and then figure out a way to apply them onto a dataset.</p>
<h2>Readings</h2>
<p>There is no definitive textbook for this course. Instead I and Dr. Denhere search for materials that we think best demonstrates the topic at hand. </p>
<p>I have created a <a href="index.html%3Fresearch%252FClusterAnalysis%252Freadings.html">Reading Page</a> to keep track of the different reading materials.</p>
<h2>Learning Notes</h2>
<p>I like to type of the content I learn from different sources. A <a href="index.html%3Fresearch%252FClusterAnalysis%252Fnotes.html">notes page</a> is created to keep track of the content discussed each meeting.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,175 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Chapter 2: Multi-armed Bandits</h1>
<p>Reinforcement learning <em>evaluates</em> the actions taken rather than accepting $instructions$ of the correct actions. This creates the need for active exploration. </p>
<p>This chapter of the book goes over a simplified version of the reinforcement learning problem, that does not involve learning to act in more than one situation. This is called a <em>nonassociative</em> setting.</p>
<p>In summation, the type of problem we are about to study is a nonassociative, evaluative feedback problem that is a simplified version of the $k$-armed bandit problem.</p>
<h2>$K$-armed bandit problem</h2>
<p>Consider the following learning problem. You are faced repeatedly with a choice among $k$ different options/actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected.</p>
<p>Your objective (if you choose to accept it) is to maximize the expected total reward over some time period. Let's say $1000$ time steps.</p>
<h3>Analogy</h3>
<p>This is called the $k$-armed bandit problem because it's an analogy of a slot machine. Slot machines are nick-named the &quot;one-armed bandit&quot;, and the goal here is to play the slot machine that has the greatest value return.</p>
<h3>Sub-goal</h3>
<p>We want to figure out which slot machine produces the greatest value. Therefore, we want to be able to estimate the value of a slot machine as close to the actual value as possible.</p>
<h3>Exploration vs Exploitation</h3>
<p>If we maintain estimates of the action values, then at any time step there is at least one action whose estimated value is the greatest. We call these <em>greedy</em> actions. When you select one of these actions we say that you are <em>exploiting</em> your current knowledge of the values of the actions. </p>
<p>If you instead select a non-greedy action, then you are <em>exploring</em>, because this enables you to better improve your estimate of the non-greedy action's value.</p>
<p>Uncertainty is such that at least one of the other actions probably better than the greedy action, you just don't know which one yet.</p>
<h2>Action-value Methods</h2>
<p>In this section, we will look at simple balancing methods in how to gain the greatest reward through exploration and exploitation.</p>
<p>We begin by looking more closely at some simple methods for estimating the values of actions and for using said estimates to make action selection decisions.</p>
<h3>Sample-average method</h3>
<p>One natural way to estimate this is by averaging the rewards actually received
$$
Q<em>t(a) = \frac{\sum</em>{i = 1}^{t - 1}R<em>i * \mathbb{I}</em>{A<em>i = 1}}{\sum</em>{i = 1}^{t - 1}\mathbb{I}_{A<em>i = 1}}
$$
where $\mathbb{I}</em>{predicate}$ denotes the random variable that is 1 if the predicate is true and 0 if it is not. If the denominator is zero (we have not experienced the reward), then we assume some default value such as zero. </p>
<h3>Greedy action selection</h3>
<p>This is where you choose greedily all the time.
$$
A_t = argmax_a(Q_t(a))
$$</p>
<h3>$\epsilon$-greedy action selection</h3>
<p>This is where we choose greedily most of the time, except for a small probability $\epsilon$. Where instead of selecting greedily, we select randomly from among all the actions with equal probability.</p>
<h3>Comparison of greedy and $\epsilon$-greedy methods</h3>
<p>The advantage of $\epsilon$-greedy over greedy methods depends on the task. With noisier rewards it takes more exploration to find the optimal action, and $\epsilon$-greedy methods should fare better relative to the greedy method. However, if the reward variances were zero, then the greedy method would know the true value of each action after trying it once.</p>
<p>Suppose the bandit task were non-stationary, that is, the true values of actions changed over time. In this case exploration is needed to make sure one of the non-greedy actions has not changed to become better than the greedy one.</p>
<h3>Incremental Implementation</h3>
<p>There is a way to update averages using small constant computations rather than storing the the numerators and denominators separate.</p>
<p>Note the derivation for the update formula
$$
\begin{align}
Q<em>{n + 1} &amp;= \frac{1}{n}\sum</em>{i = 1}^n{R_i} \
&amp;= \frac{1}{n}(R<em>n + \sum</em>{i = 1}^{n - 1}{R_i}) \
&amp;= \frac{1}{n}(R<em>n + (n - 1)\frac{1}{n-1}\sum</em>{i = 1}^{n - 1}{R_i}) \
&amp;= \frac{1}{n}{R_n + (n - 1)Q_n} \
&amp;= \frac{1}{n}(R_n + nQ_n - Q_n) \
&amp;= Q_n + \frac{1}{n}(R_n - Q_n) \tag{2.3}
\end{align}
$$
With formula 2.3, the implementation requires memory of only $Q_n$ and $n$.</p>
<p>This update rule is a form that occurs frequently throughout the book. The general form is
$$
NewEstimate = OldEstimate + StepSize(Target - OldEstimate)
$$</p>
<h3>Tracking a Nonstationary Problem</h3>
<p>As noted earlier, we often encounter problems that are nonstationary, in such cases it makes sense to give more weight to recent rewards than to long-past rewards. One of the most popular ways to do this is to use a constant value for the $StepSize$ parameter. We modify formula 2.3 to be
$$
\begin{align}
Q_{n + 1} &amp;= Q_n + \alpha(R_n - Q_n) \
&amp;= \alpha R_n + Q_n - \alpha Q_n \
&amp;= \alpha R_n + (1 - \alpha)Q_n \
&amp;= \alpha R<em>n + (1 - \alpha)(\alpha R</em>{n - 1} + (1-\alpha)Q_{n - 1}) \
&amp;= \alpha R<em>n + (1 - \alpha)(\alpha R</em>{n - 1} + (1-\alpha)(\alpha R<em>{n - 2} + (1 - a)Q</em>{n - 2})) \
&amp;= \alpha R<em>n + (1-\alpha)\alpha R</em>{n - 1} + (1-\alpha)^2\alpha R_{n - 2} + \dots + (1-\alpha)^nQ_1 \
&amp;= (1-\alpha)^nQ<em>1 + \sum</em>{i = 1}^n{\alpha(1-\alpha)^{n - i}R_i}
\end{align}
$$
This is a weighted average since the summation of all the weights equal one. Note here that the farther away a value is from the current time, the more times $(1-\alpha)$ gets multiplied by itself. Hence making it less influential. This is sometimes called an <em>exponential recency-weighted average</em>.</p>
<h3>Manipulating $\alpha_n(a)$</h3>
<p>Sometimes it is convenient to vary the step-size parameter from step to step. We can denote $\alpha_n(a)$ to be a function that determines the step-size parameter after the $n$th selection of action $a$. As noted before $\alpha_n(a) = \frac{1}{n}$ results in the sample average method which is guaranteed to converge to the truth action values assuming a large number of trials. </p>
<p>A well known result in stochastic approximation theory gives us the following conditions to assure convergence with probability 1:
$$
\sum_{n = 1}^\infty{\alpha<em>n(a) = \infty} \and \sum</em>{n = 1}^{\infty}{\alpha_n^2(a) \lt \infty}
$$
The first condition is required to guarantee that the steps are large enough to overcome any initial conditions or random fluctuations. The second condition guarantees that eventually the steps become small enough to assure convergence.</p>
<p><strong>Note:</strong> Both convergence conditions are met for the sample-average case but not for the constant step-size parameter. The latter condition is violated in the constant parameter case. This is desirable since if the rewards are changing then we don't want it to converge to any one parameter.</p>
<h3>Optimistic Initial Values</h3>
<p>The methods discussed so far are biased by their initial estimates. Another downside is that these values are another set of parameters that must be chosen by the user. Though these initial values can be used as a simple way to encourage exploration.</p>
<p>Let's say you set an initial estimate that is wildly optimistic. Whichever actions are initially selected, the reward is less than the starting estimates. Therefore, the learner switches to other actions, being <em>disappointed</em> with the rewards it was receiving. </p>
<p>The result of this is that all actions are tried several times before their values converge. It even does this if the algorithm is set to choose greedily most of the time! </p>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/1536284892566.png" alt="1536284892566" /></p>
<p>This simple trick is quite effective for stationary problems. Not so much for nonstationary problems since the drive for exploration only happens at the beginning. If the task changes, creating a renewed need for exploration, this method would not catch it.</p>
<h3>Upper-Confidence-Bound Action Selection</h3>
<p>Exploration is needed because there is always uncertainty about the accuracy of the action-value estimates. The greedy actions are those that look best at the present but some other options may actually be better. Let's choose options that have potential for being optimal, taking into account how close their estimates are to being maximal and the uncertainties in those estimates.
$$
A_t = argmax_a{(Q_t(a) + c\sqrt{\frac{ln(t)}{N_t(a)}})}
$$
where $N_t(a)$ denotes the number of times that $a$ has been selected prior to time $t$ and $c &gt; 0$ controls the degree of exploration.</p>
<h3>Associative Search (Contextual Bandits)</h3>
<p>So far, we've only considered nonassociative tasks, where there is no need to associate different actions with different situations. However, in a general reinforcement learning task there is more than one situation and the goal is to learn a policy: a mapping from situations to the actions that are best in those situations.</p>
<p>For sake of continuing our example, let us suppose that there are several different $k$-armed bandit tasks, and that on each step you confront one of these chosen at random. To you, this would appear as a single, nonstationry $k$-armed bandit task whose true action values change randomly from step to step. You could try using one of the previous methods, but unless the true action values change slowly, these methods will not work very well.</p>
<p>Now suppose, that when a bandit task is selected for you, you are given some clue about its identity. Now you can learn a policy association each task, singled by the clue, with the best action to take when facing that task.</p>
<p>This is an example of an <em>associative search</em> task, so called because it involves both trial-and-error learning to <em>search</em> for the best actions, and <em>association</em> of these actions with situations in which they are best. Nowadays they're called <em>contextual bandits</em> in literature.</p>
<p>If actions are allowed to affect the next situation as well as the reward, then we have the full reinforcement learning problem. This will be presented in the next chapter of the book with its ramifications appearing throughout the rest of the book.</p>
<p><img src="http://cs.umw.edu/home/rozek/Pictures/1536321791927.png" alt="1536321791927" /></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,169 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Chapter 4: Dynamic Programming</h1>
<p>Dynamic programming refers to a collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a Markov decision process (MDP).</p>
<p>Classic DP algorithms are of limited utility due to their assumption of a perfect model and their great computational expense.</p>
<p>Let's assume that the environment is a finite MDP. We assume its state, action, and reward sets, $\mathcal{S}, \mathcal{A}, \mathcal{R}$ are finite, and that its dynamics are given by a set of probabilities $p(s^\prime, r | s , a)$.</p>
<p>The key idea of dynamic programming, and of reinforcement learning is the use of value functions to organize and structure the search for good policies. In this chapter, we show how dynamic programming can be used to compute the value functions defined in Chapter 3. We can easily obtain optimal policies once we have found the optimal value functions which satisfy the Bellman optimality equations.</p>
<h2>Policy Evaluation</h2>
<p>First we consider how to compute the state-value function for an arbitrary policy. The existence and uniqueness of a state-value function for an arbitrary policy are guaranteed as long as either the discount rate is less than one or eventual termination is guaranteed from all states under the given policy.</p>
<p>Consider a sequence of approximate value functions. The initial approximation, $v<em>0$, is chosen arbitrarily (except that the terminal state must be given a value of zero) and each successive approximation is obtained by using the Bellman equation for $v</em>\pi$ as an update rule:
$$
v<em>{k + 1} = \sum</em>{a}{\pi(a |s)\sum_{s^\prime, r}{p(s^\prime,r|s,a)[r + \gamma v_k(s^\prime)]}}
$$
This algorithm is called <em>iterative policy evaluation</em>.</p>
<p>To produce each successive approximation, $v_{k + 1}$ from $v_k$, iterative policy evaluation applies the same operation to each state $s$: it replaces the old value of $s$ with a new value obtained from the old values of the successor states of $s$, and the expected immediate rewards, along all the one-step transitions possible under the policy being evaluated.</p>
<p><u><strong>Iterative Policy Evaluation</strong></u></p>
<pre><code>Input π, the policy to be evaluated
Initialize an array V(s) = 0, for all s ∈ S+
Repeat
∆ ← 0
For each s ∈ S:
v ← V(s)
V(s) ← ∑_a π(a|s) ∑_s,r p(s,r|s,a)[r+γV(s)]
∆ ← max(∆,|vV(s)|)
until ∆ &lt; θ (a small positive number)
Output V ≈ v_π</code></pre>
<h3>Grid World Example</h3>
<p>Consider a grid world where the top left and bottom right squares are the terminal state. Now consider that every other square, produces a reward of -1, and the available actions on each state is {up, down, left, right} as long as that action keeps the agent on the grid. Suppose our agent follows the equiprobable random policy. </p>
<p>![1540262811089](/home/rozek/Documents/Research/Reinforcement Learning/1540262811089.png)</p>
<h2>Policy Improvement</h2>
<p>One reason for computing the value function for a policy is to help find better policies. Suppose we have determined the value function $v<em>\pi$ for an arbitrary deterministic policy $\pi$. For some state $s$ we would like to know whether or not we should change the policy to determinatically chose another action. We know how good it is to follow the current policy from $s$, that is $v</em>\pi(s)$, but would it be better or worse to change to the new policy? </p>
<p>One way to answer this question is to consider selecting $a$ in $s$ and thereafter follow the existing policy, $\pi$. The key criterion is whether the this produces a value greater than or less than $v_\pi(s)$. If it is greater, then one would expect it to be better still to select $a$ every time $s$ is encountered, and that the new policy would in fact be a better one overall.</p>
<p>That this is true is a special case of a general result called the <em>policy improvement theorem</em>. Let $\pi$ and $\pi^\prime$ be any pair of deterministic policies such that for all $s \in \mathcal{S}$.
$$
q<em>\pi(s, \pi^\prime(s)) \ge v</em>\pi(s)
$$
So far we have seen how, given a policy and its value function, we can easily evaluate a change in the policy at a single state to a particular action. It is a natural extension to consider changes at all states and to all possible actions, selecting at each state the action that appears best according to $q<em>\pi(s, a)$. In other words, to consider the new <em>greedy</em> policy, $\pi^\prime$, given by:
$$
\pi^\prime = argmax (q</em>\pi(s, a))
$$
So far in this section we have considered the case of deterministic policies. We will not go through the details, but in fact all the ideas of this section extend easily to stochastic policies.</p>
<h2>Policy Iteration</h2>
<p>Once a policy, $\pi$, has been improved using $v<em>\pi$ to yield a better policy, $\pi^\prime$, we can then compute $v</em>{\pi^\prime}$ and improve it again to yield an even better $\pi^{\prime\prime}$. We can thus obtain a sequence of monotonically improving policies and value functions.</p>
<p>Each policy is guaranteed to be a strict improvement over the previous one (unless its already optimal). Since a finite MDP has only a finite number of policies, this process must converge to an optimal policy and optimal value function in a finite number of iterations.</p>
<p>This way of finding an optimal policy is called <em>policy iteration</em>.</p>
<p><u>Algorithm</u></p>
<pre><code>1. Initialization
V(s) ∈ R and π(s) ∈ A(s) arbitrarily for all s ∈ S
2. Policy Evaluation
Repeat
∆ ← 0
For each s∈S:
v ← V(s)
V(s) ← ∑_{s,r} p(s,r|s,π(s))[r + γV(s)]
∆ ← max(∆,|v V(s)|)
until ∆ &lt; θ (a small positive number)
3. Policy Improvement
policy-stable ← true
For each s ∈ S:
old-action ← π(s)
π(s) ← arg max_a ∑_{s,r} p(s,r|s,a)[r + γV(s)]
If old-action != π(s), then policy-stable ← false
If policy-stable, then stop and return V ≈ v_,
and π ≈ π_; else go to 2</code></pre>
<h2>Value Iteration</h2>
<p>One drawback to policy iteration is that each of its iterations involve policy evaluation, which may itself be a protracted iterative computation requiring multiple sweeps through the state set. If policy evaluation is done iteratively, then convergence exactly to $v_\pi$ occurs only in the limit. Must we wait for exact convergence, or can we stop short of that?</p>
<p>In fact, the policy evaluation step of policy iteration can be truncated in several ways without losing the convergence guarantee of policy iteration. One important special case is when policy evaluation is stopped after one sweep. This algorithm is called value iteration. </p>
<p>Another way of understanding value iteration is by reference to the Bellman optimality equation. Note that value iteration is obtained simply by turning the Bellman optimality equation into an update rule. Also note how value iteration is identical to the policy evaluation update except that it requires the maximum to be taken over all actions.</p>
<p>Finally, let us consider how value iteration terminates. Like policy evaluation, value iteration formally requires an infinite number of iterations to converge exactly. In practice, we stop once the value function changes by only a small amount.</p>
<pre><code>Initialize array V arbitrarily (e.g., V(s) = 0 for all
s ∈ S+)
Repeat
∆ ← 0
For each s ∈ S:
v ← V(s)
V(s) ← max_a∑_{s,r} p(s,r|s,a)[r + γV(s)]
∆ ← max(∆, |v V(s)|)
until ∆ &lt; θ (a small positive number)
Output a deterministic policy, π ≈ π_, such that
π(s) = arg max_a ∑_{s,r} p(s,r|s,a)[r + γV(s)]</code></pre>
<p>Value iteration effectively combines, in each of its sweeps, one sweep of policy evaluation and one sweep of policy improvement. Faster convergence is often achieved by interposing multiple policy evaluation sweeps between each policy improvement sweep. </p>
<h2>Asynchronous Dynamic Programming</h2>
<p><em>Asynchronous</em> DP algorithms are in-place DP algorithms that are not organized in terms of systematic sweeps of the state set. These algorithms update the values of states in any order whatsoever, using whatever value of other states happen to be available.</p>
<p>To converge correctly, however, an asynchronous algorithm must continue to update the value of all the states: it can't ignore any state after some point in the computation.</p>
<h2>Generalized Policy Iteration</h2>
<p>Policy iteration consists of two simultaneous, iterating processes, one making the value function consistent with the current policy (policy evaluation) and the other making the policy greedy with respect to the current value function (policy improvement).</p>
<p>We use the term <em>generalized policy iteration</em> (GPI) to competing and cooperating. They compete in the sense that they pull in opposing directions. Making the policy greedy with respect to the value function typically makes the value function incorrect for the changed policy. Making the value function consistent with the policy typically causes the policy to be greedy. In the long run, however, the two processes interact to find a single joint solution. </p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,104 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Introduction to Reinforcement Learning Day 1</h1>
<p>Recall that this course is based on the book -- </p>
<p>Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto</p>
<p>These notes really serve as talking points for the overall concepts described in the chapter and are not meant to stand for themselves. Check out the book for more complete thoughts :)</p>
<p><strong>Reinforcement Learning</strong> is learning what to do -- how to map situations to actions -- so as to maximize a numerical reward signal. There are two characteristics, trial-and-error search and delayed reward, that are the two most important distinguishing features of reinforcement learning.</p>
<p>Markov decision processes are intended to include just these three aspects: sensation, action, and goal(s).</p>
<p>Reinforcement learning is <strong>different</strong> than the following categories</p>
<ul>
<li>Supervised learning: This is learning from a training set of labeled examples provided by a knowledgeable external supervisor. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all situations in which the agent has to act.</li>
<li>Unsupervised learning: Reinforcement learning is trying to maximize a reward signal as opposed to finding some sort of hidden structure within the data.</li>
</ul>
<p>One of the challenges that arise in reinforcement learning is the <strong>trade-off</strong> between exploration and exploitation. The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task.</p>
<p>Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is different than supervised learning since they're concerned with finding the best classifier/regression without explicitly specifying how such an ability would finally be useful.</p>
<p>A complete, interactive, goal-seeking agent can also be a component of a larger behaving system. A simple example is an agent that monitors the charge level of a robot's battery and sends commands to the robot's control architecture. This agent's environment is the rest of the robot together with the robot's environment.</p>
<h2>Definitions</h2>
<p>A policy defines the learning agent's way of behaving at a given time</p>
<p>A reward signal defines the goal in a reinforcement learning problem. The agent's sole objective is to maximize the total reward it receives over the long run</p>
<p>A value function specifies what is good in the long run. Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Without rewards there could be no value,s and the only purpose of estimating values is to achieve more reward. We seek actions that bring about states of highest value. </p>
<p>Unfortunately, it is much harder to determine values than it is to determine rewards. The most important component of almost all reinforcement learning algorithms we consider is a method for efficiently estimating values.</p>
<p><strong>Look at Tic-Tac-Toe example</strong></p>
<p>Most of the time in a reinforcement learning algorithm, we move greedily, selecting the move that leads to the state with greatest value. Occasionally, however, we select randomly from amoung the other moves instead. These are called exploratory moves because they cause us to experience states that we might otherwise never see.</p>
<p>Summary: Reinforcement learning is learning by an agent from direct interaction wit its environment, without relying on exemplary supervision or complete models of the environment.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,146 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Chapter 5: Monte Carlo Methods</h1>
<p>Monte Carlo methods do not assume complete knowledge of the environment. They require only <em>experience</em> which is a sample sequence of states, actions, and rewards from actual or simulated interaction with an environment. </p>
<p>Monte Carlo methods are ways of solving the reinforcement learning problem based on averaging sample returns. To ensure that well-defined returns are available, we define Monte Carlo methods only for episodic tasks. Only on the completion of an episode are value estimates and policies changed. </p>
<p>Monte Carlo methods sample and average returns for each state-action pair is much like the bandit methods explored earlier. The main difference is that there are now multiple states, each acting like a different bandit problems and the problems are interrelated. Due to all the action selections undergoing learning, the problem becomes nonstationary from the point of view of the earlier state.</p>
<h2>Monte Carlo Prediction</h2>
<p>Recall that the value of a state is the expected return -- expected cumulative future discounted reward - starting from that state. One way to do it is to estimate it from experience by averaging the returns observed after visits to that state.</p>
<p>Each occurrence of state $s$ in an episode is called a <em>visit</em> to $s$. The <em>first-visit MC method</em> estimates $v_\pi(s)$ as the average of the returns following first visits to $s$, whereas the <em>every-visit MC method</em> averages the returns following all visits to $s$. These two Monte Carlo methods are very similar but have slightly different theoretical properties. </p>
<p><u>First-visit MC prediction</u></p>
<pre><code>Initialize:
π ← policy to be evaluated
V ← an arbitrary state-value function
Returns(s) ← an empty list, for all s ∈ S
Repeat forever:
Generate an episode using π
For each state s appearing in the episode:
G ← the return that follows the first occurrence of
s
Append G to Returns(s)
V(s) ← average(Returns(s))</code></pre>
<h2>Monte Carlo Estimation of Action Values</h2>
<p>If a model is not available then it is particularly useful to estimate <em>action</em> values rather than state values. With a model, state values alone are sufficient to define a policy. Without a model, however, state values alone are not sufficient. One must explicitly estimate the value of each action in order for the values to be useful in suggesting a policy. </p>
<p>The only complication is that many state-action pairs may never be visited. If $\pi$ is a deterministic policy, then in following $\pi$ one will observe returns only for one of the actions from each state. With no returns to average, the Monte Carlo estimates of the other actions will not improve with experience. This is a serious problem because the purpose of learning action values is to help in choosing among the actions available in each state. </p>
<p>This is the general problem of <em>maintaining exploration</em>. For policy evaluation to work for action values, we must assure continual exploration. One way to do this is by specifying that the episodes <em>start in a state-action pair</em>, and that each pair has a nonzero probability of being selected as the start. We call this the assumption of <em>exploring starts</em>.</p>
<h2>Monte Carlo Control</h2>
<p>We made two unlikely assumptions above in order to easily obtain this guarantee of convergence for the Monte Carlo method. One was that the episodes have exploring starts, and the other was that policy evaluation could be done with an infinite number of episodes. </p>
<p><u>Monte Carlo Exploring Starts</u></p>
<pre><code>Initialize, for all s ∈ S, a ∈ A(s):
Q(s,a) ← arbitrary
π(s) ← arbitrary
Returns(s,a) ← empty list
Repeat forever:
Choose S_0 ∈ S and A_0 ∈ A(S_0) s.t. all pairs have probability &gt; 0
Generate an episode starting from S_0,A_0, following
π
For each pair s,a appearing in the episode:
G ← the return that follows the first occurrence of
s,a
Append G to Returns(s,a)
Q(s,a) ← average(Returns(s,a))
For each s in the episode:
π(s) ← arg max_a Q(s,a)</code></pre>
<h2>Monte Carlo Control without Exploring Starts</h2>
<p>The only general way to ensure that actions are selected infinitely often is for the agent to continue to select them. There are two approaches ensuring this, resulting in what we call <em>on-policy</em> methods and <em>off-policy</em> methods. </p>
<p>On-policy methods attempt to evaluate or improve the policy that is used to make decisions, whereas off-policy methods evaluate or improve a policy different from that used to generate the data.</p>
<p>In on-policy control methods the policy is generally <em>soft</em>, meaning that $\pi(a|s)$ for all $a \in \mathcal{A}(s)$. The on-policy methods in this section uses $\epsilon$-greedy policies, meaning that most of the time they choose an action that has maximal estimated action value, but with probability $\epsilon$ they instead select an action at random. </p>
<p><u>On-policy first-visit MC control (for $\epsilon$-soft policies)</u></p>
<pre><code></code></pre>
<p>Initialize, for all $s \in \mathcal{S}$, $a \in \mathcal{A}(s)$:</p>
<p>$Q(s, a)$ ← arbitrary</p>
<p>$Returns(s,a)$ ← empty list</p>
<p>$\pi(a|s)$ ← an arbitrary $\epsilon$-soft policy</p>
<p>Repeat forever:</p>
<p>(a) Generate an episode using $\pi$</p>
<p>(b) For each pair $s,a$ appearing in the episoe</p>
<p> $G$ ← the return that follows the first occurance of s, a</p>
<p> Append $G$ to $Returns(s,a)$</p>
<p> $Q(s, a)$ ← average($Returns(s,a)$)</p>
<p>(c) For each $s$ in the episode:</p>
<p> $A^*$ ← argmax$_a Q(s,a)$ (with ties broken arbitrarily)</p>
<p> For all $a \in \mathcal{A}(s)$:</p>
<p> $\pi(a|s)$ ← $\begin{cases} 1 - \epsilon + \epsilon / |\mathcal{A}(s)| &amp; a = A^<em> \ \epsilon / | \mathcal{A}(s)| &amp; a \neq A^</em> \end{cases}$</p>
<pre><code></code></pre>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,274 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Chapter 3: Finite Markov Decision Processes</h1>
<p>Markov Decision processes are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to trade-off immediate and delayed reward. Whereas in bandit problems we estimated the value of $q<em>*(a)$ of each action $a$, in MDPs we estimate the value of $q</em>*(s, a)$ of each action $a$ in state $s$. </p>
<p>MDPs are a mathematically idealized form of the reinforcement learning problem. As we will see in artificial intelligence, there is often a tension between breadth of applicability and mathematical tractability. This chapter will introduce this tension and discuss some of the trade-offs and challenges that it implies. </p>
<h2>Agent-Environment Interface</h2>
<p>The learner and decision maker is called the <em>agent</em>. The thing it interacts with is called the <em>environment</em>. These interact continually, the agent selecting actions and the environment responding to these actions and presenting new situations to the agent.</p>
<p>The environment also give rise to rewards, special numerical values that the agent seeks to maximize over time through its choice of actions.</p>
<p>![1536511147205](/home/rozek/Documents/Research/Reinforcement Learning/1536511147205.png)</p>
<p>To make the future paragraphs clearer, a Markov Decision Process is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker.</p>
<p>In a <em>finite</em> MDP, the set of states, actions, and rewards all have a finite number of elements. In this case, the random variables R_t and S<em>t have a well defined discrete probability distribution dependent only on the preceding state and action.
$$
p(s^\prime | s,a) = \sum</em>{r \in \mathcal{R}}{p(s^\prime, r|s, a)}
$$
Breaking down the above formula, it's just an instantiation of the law of total probability. If you partition the probabilistic space by the reward, if you sum up each partition you should get the overall space. This formula has a special name: the <em>state-transition probability</em>.</p>
<p>From this we can compute the expected rewards for each state-action pair by multiplying each reward with the probability of getting said reward and summing it all up.
$$
r(s, a) = \sum<em>{r \in \mathcal{R}}{r}\sum</em>{s^\prime \in \mathcal{S}}{p(s^\prime, r|s,a)}
$$
The expected reward for a state-action-next-state triple is
$$
r(s, a, s^\prime) = \sum_{r \in \mathcal{R}}{r\frac{p(s^\prime, r|s,a)}{p(s^\prime|s,a)}}
$$
I wasn't able to piece together this function in my head like the others. Currently I imagine it as if we took the formula before and turned the universe of discourse from the universal set to the state where $s^\prime$ occurred.</p>
<p>The MDP framework is abstract and flexible and can be applied to many different problems in many different ways. For example, the time steps need not refer to fixed intervals of real time; they can refer to arbitrary successive states of decision making and acting.</p>
<h3>Agent-Environment Boundary</h3>
<p>In particular, the boundary between agent and environment is typically not the same as the physical boundary of a robot's or animals' body. Usually, the boundary is drawn closer to the agent than that. For example, the motors and mechanical linkages of a robot and its sensing hardware should usually be considered parts of the environment rather than parts of the agent.</p>
<p>The general rule we follow is that anything that cannot be changed arbitrarily by the agent is considered to be outside of it and thus part of its environment. We do not assume that everything in the environment is unknown to the agent. For example, the agent often knows quite a bit about how its rewards are computed as a function of its actions and the states in which they are taken. But we always consider the reward computation to be external to the agent because it defines the task facing the agent and thus must be beyond its ability to change arbitrarily. The agent-environment boundary represents the limit of the agent's absolute control, not of its knowledge.</p>
<p>This framework breaks down whatever the agent is trying to achieve to three signals passing back and forth between an agent and its environment: one signal to represent the choices made by the agent, one signal to represent the basis on which the choices are made (the states), and one signal to define the agent's goal (the rewards).</p>
<h3>Example 3.4: Recycling Robot MDP</h3>
<p>Recall that the agent makes a decision at times determined by external events. At each such time the robot decides whether it should</p>
<p>(1) Actively search for a can</p>
<p>(2) Remain stationary and wait for someone to bring it a can</p>
<p>(3) Go back to home base to recharge its battery</p>
<p>Suppose the environment works as follows: the best way to find cans is to actively search for them, but this runs down the robot's battery, whereas waiting does not. Whenever the robot is searching the possibility exists that its battery will become depleted. In this case, the robot must shut down and wait to be rescued (producing a low reward.)</p>
<p>The agent makes its decisions solely as a function of the energy level of the battery, It can distinguish two levels, high and low, so that the state set is $\mathcal{S} = {high, low}$. Let us call the possible decisions -- the agent's actions -- wait, search, and recharge. When the energy level is high, recharging would always be foolish, so we do not include it in the action set for this state. The agent's action sets are
$$
\begin{align<em>}
\mathcal{A}(high) &amp;= {search, wait} \
\mathcal{A}(low) &amp;= {search, wait, recharge}
\end{align</em>}
$$
If the energy level is high, then a period of active search can always be completed without a risk of depleting the battery. A period of searching that begins with a high energy level leaves the energy level high with a probability of $\alpha$ and reduces it to low with a probability of $(1 - \alpha)$. On the other hand, a period of searching undertaken when the energy level is low leaves it low with a probability of $\beta$ and depletes the battery with a probability of $(1 - \beta)$. In the latter case, the robot must be rescued, and the batter is then recharged back to high.</p>
<p>Each can collected by the robot counts as a unit reward, whereas a reward of $-3$ occurs whenever the robot has to be rescued. Let $r<em>{search}$ and $r</em>{wait}$ with $r<em>{search } &gt; r</em>{wait}$, respectively denote the expected number of cans the robot will collect while searching and waiting. Finally, to keep things simple, suppose that no cans can be collected ruing a run home for recharging and that no cans can be collected on a step in which the battery is depleted.</p>
<table>
<thead>
<tr>
<th>$s$</th>
<th>$a$</th>
<th>$s^\prime$</th>
<th>$p(s^\prime</th>
<th>s, a)$</th>
<th>$r(s</th>
</tr>
</thead>
<tbody>
<tr>
<td>high</td>
<td>search</td>
<td>high</td>
<td>$\alpha$</td>
<td>$r_{search}$</td>
</tr>
<tr>
<td>high</td>
<td>search</td>
<td>low</td>
<td>$(1-\alpha)$</td>
<td>$r_{search}$</td>
</tr>
<tr>
<td>low</td>
<td>search</td>
<td>high</td>
<td>$(1 - \beta)$</td>
<td>-3</td>
</tr>
<tr>
<td>low</td>
<td>search</td>
<td>low</td>
<td>$\beta$</td>
<td>$r_{search}$</td>
</tr>
<tr>
<td>high</td>
<td>wait</td>
<td>high</td>
<td>1</td>
<td>$r_{wait}$</td>
</tr>
<tr>
<td>high</td>
<td>wait</td>
<td>low</td>
<td>0</td>
<td>$r_{wait}$</td>
</tr>
<tr>
<td>low</td>
<td>wait</td>
<td>high</td>
<td>0</td>
<td>$r_{wait}$</td>
</tr>
<tr>
<td>low</td>
<td>wait</td>
<td>low</td>
<td>1</td>
<td>$r_{wait}$</td>
</tr>
<tr>
<td>low</td>
<td>recharge</td>
<td>high</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>low</td>
<td>recharge</td>
<td>low</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>A <em>transition graph</em> is a useful way to summarize the dynamics of a finite MDP. There are two kinds of nodes: <em>state nodes</em> and <em>action nodes</em>. There is a state node for each possible state and an action node for reach state-action pair. Starting in state $s$ and taking action $a$ moves you along the line from state node $s$ to action node $(s, a)$. The the environment responds with a transition ot the next states node via one of the arrows leaving action node $(s, a)$. </p>
<p>![1537031348278](/home/rozek/Documents/Research/Reinforcement Learning/1537031348278.png)</p>
<h2>Goals and Rewards</h2>
<p>The reward hypothesis is that all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal called the reward.</p>
<p>Although formulating goals in terms of reward signals might at first appear limiting, in practice it has proved to be flexible and widely applicable. The best way to see this is to consider the examples of how it has been, or could be used. For example:</p>
<ul>
<li>To make a robot learn to walk, researchers have provided reward on each time step proportional to the robot's forward motion. </li>
<li>In making a robot learn how to escape from a maze, the reward is often $-1$ for every time step that passes prior to escape; this encourages the agent to escape as quickly as possible.</li>
<li>To make a robot learn to find and collect empty soda cans for recycling, one might give it a reward of zero most of the time, and then a reward of $+1$ for each can collected. One might also want to give the robot negative rewards when it bumps into things or when somebody yells at it. </li>
<li>For an agent to play checkers or chess, the natural rewards are $+1$ for winning, $-1$ for losing, and $0$ for drawing and for all nonterminal positions.</li>
</ul>
<p>It is critical that the rewards we set up truly indicate what we want accomplished. In particular, the reward signal is not the place to impart to the agent prior knowledge about <em>how</em> to achieve what we want it to do. For example, a chess playing agent should only be rewarded for actually winning, not for achieving subgoals such as taking its opponent's pieces. If achieving these sort of subgoals were rewarded, then the agent might find a way to achieve them without achieving the real goal. The reward signal is your way of communicating what you want it to achieve, not how you want it achieved.</p>
<h2>Returns and Episodes</h2>
<p>In general, we seek to maximize the <em>expected return</em>, where the return is defined as some specific function of the reward sequence. In the simplest case, the return is the sum of the rewards:
$$
G<em>t = R</em>{t + 1} + R<em>{t + 2} + R</em>{t + 3} + \dots + R_{T}
$$
where $T$ is the final time step. This approach makes sense in applications in which there is a natural notion of a final time step. That is when the agent-environment interaction breaks naturally into subsequences or <em>episodes</em>, such as plays of a game, trips through a maze, or any sort of repeated interaction.</p>
<h3>Episodic Tasks</h3>
<p>Each episode ends in a special state called the <em>terminal state</em>, followed by a reset to the standard starting state or to a sample from a standard distribution of starting states. Even if you think of episodes as ending in different ways the next episode begins independently of how the previous one ended. Therefore, the episodes can all be considered to ending the same terminal states with different rewards for different outcomes. </p>
<p>Tasks with episodes of this kind are called <em>episodic tasks</em>. In episodic tasks we sometimes need to distinguish the set of all nonterminal states, denoted $\mathcal{S}$, from the set of all states plus the terminal state, denoted $\mathcal{S}^+$. The time of termination, $T$, is a random variable that normally varies from episode to episode.</p>
<h3>Continuing Tasks</h3>
<p>On the other hand, in many cases the agent-environment interaction goes on continually without limit. We call these <em>continuing tasks</em>. The return formulation above is problematic for continuing tasks because the final time step would be $T = \infty$, and the return which we are trying to maximize, could itself easily be infinite. The additional concept that we need is that of <em>discounting</em>. According to this approach, the agent tries to select actions so that the sum of the discounted rewards it receives over the future is maximized. In particular, it chooses $A_t$ to maximize the expected discounted return.
$$
G<em>t= \sum</em>{k = 0}^\infty{\gamma^kR_{t+k+1}}
$$
where $\gamma$ that is a parameter between $0$ and $1$ is called the <em>discount rate</em>.</p>
<h4>Discount Rate</h4>
<p>The discount rate determines the present value of future rewards: a reward received $k$ time steps in the future is worth only $\gamma^{k - 1}$ time what it would be worth if it were received immediately. If $\gamma &lt; 1$, the infinite sum has a finite value as long as the reward sequence is bounded. </p>
<p>If $\gamma = 0$, the agent is &quot;myopic&quot; in being concerned only with maximizing immediate rewards. But in general, acting to maximize immediate reward can reduce access to future rewards so that the return is reduced. </p>
<p>As $\gamma$ approaches 1, the return objective takes future rewards into account more strongly; the agent becomes more farsighted.</p>
<h3>Example 3.5 Pole-Balancing</h3>
<p>The objective in this task is to apply forces to a car moving along a track so as to keep a pole hinged to the cart from falling over.</p>
<p>A failure is said to occur if the pole falls past a given angle from the vertical or if the cart runs off the track.</p>
<p>![1537500975060](/home/rozek/Documents/Research/Reinforcement Learning/1537500975060.png)</p>
<h4>Approach 1</h4>
<p>The reward can be a $+1$ for every time step on which failure did not occur. In this case, successful balancing would mean a return of infinity.</p>
<h4>Approach 2</h4>
<p>The reward can be $-1$ on each failure and zero all other times. The return at each time would then be related to $-\gamma^k$ where $k$ is the number of steps before failure.</p>
<p>Either case the return is maximized by keeping the pole balanced for as long as possible. </p>
<h2>Policies and Value Functions</h2>
<p>Almost all reinforcement learning algorithms involve estimating <em>value functions</em> which estimate what future rewards can be expected. Of course the rewards that the agent can expect to receive is dependent on the actions it will take. Accordingly, value functions are defined with respect to particular ways of acting, called policies.</p>
<p>Formally, a <em>policy</em> is a mapping from states to probabilities of selecting each possible action. The <em>value</em> of a state s under a policy \pi, denoted $v<em>{\pi}(s)$ is the expected return when starting in $s$ and following $\pi$ thereafter. For MDPs we can define $v</em>{\pi}$ as</p>
<p>$$
v<em>{\pi}(s) = \mathbb{E}</em>{\pi}[G_t | S<em>t = s] = \mathbb{E}</em>{\pi}[\sum<em>{k = 0}^\infty{\gamma^kR</em>{t+k+1} | S_t = s}]
$$</p>
<p>We call this function the <em>state-value function for policy $\pi$</em>. Similarly, we define the value of taking action $a$ in state $s$ under a policy $\pi$, denoted as $q_\pi(s,a)$ as the expected return starting from $s$, taking the action $a$, and thereafter following the policy $\pi$. Succinctly, this is called the <em>action-value function for policy $\pi$</em>.</p>
<h3>Optimality and Approximation</h3>
<p>For some kinds of tasks we are interested in,optimal policies can be generated only with extreme computational cost. A critical aspect of the problemfacing the agent is always teh computational power available to it, in particular, the amount of computation it can perform in a single time step.</p>
<p>The memory available is also an important constraint. A large amount of memory is often required to build up approximations of value functions, policies, and models. In the case of large state sets, functions must be approximated using some sort of more compact parameterized function representation.</p>
<p>This presents us with unique oppurtunities for achieving useful approximations. For example, in approximating optimal behavior, there may be many states that the agent faces with such a low probability that selecting suboptimal actions for them has little impact on the amount of reward the agent receives.</p>
<p>The online nature of reinforcement learning which makes it possible to approximate optimal policies in ways that put more effort into learning to make good decisions for frequently encountered states at the expense of infrequent ones is the key property that distinguishes reinforcement learning from other approaches to approximately solve MDPs.</p>
<h3>Summary</h3>
<p>Let us summarize the elements of the reinforcement learning problem.</p>
<p>Reinforcement learning is about learning from interaction how to behave in order to achieve a goal. The reinforcement learning <em>agent</em> and its <em>environment</em> interact over a sequence of discrete time steps.</p>
<p>The <em>actions</em> are the choices made by the agent; the states are the basis for making the choice; and the <em>rewards</em> are the basis for evaluating the choices.</p>
<p>Everything inside the agent is completely known and controllable by the agent; everything outside is incompletely controllable but may or may not be completely known.</p>
<p>A <em>policy</em> is a stochastic rule by which the agent selects actions as a function of states.</p>
<p>When the reinforcement learning setup described above is formulated with well defined transition probabilities it constitutes a Markov Decision Process (MDP)</p>
<p>The <em>return</em> is the function of future rewards that the agent seeks to maximize. It has several different definitions depending on the nature of the task and whether one wishes to <em>discount</em> delayed reward. </p>
<ul>
<li>The un-discounted formulation is appropriate for <em>episodic tasks</em>, in which the agent-environment interaction breaks naturally into <em>episodes</em></li>
<li>The discounted formulation is appropriate for <em>continuing tasks</em> in which the interaction does not naturally break into episodes but continue without limit</li>
</ul>
<p>A policy's <em>value functions</em> assign to each state, or state-action pair, the expected return from that state, or state-action pair, given that the agent uses the policy. The <em>optimal value functions</em> assign to each state, or state-action pair, the largest expected return achievable by any policy. A policy whose value unctions are optimal is an <em>optimal policy</em>.</p>
<p>Even if the agent has a complete and accurate environment model, the agent is typically unable to perform enough computation per time step to fully use it. The memory available is also an important constraint. Memory may be required to build up accurate approximations of value functions, policies, and models. In most cases of practical interest there are far more states that could possibly be entries in a table, and approximations must be made.</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,88 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Lecture Notes for Reinforcement Learning</h1>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Fnotes%252Fintro.html">Chapter 1: An Introduction</a></p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Fnotes%252Fbandits.html">Chapter 2: Multi-armed Bandits</a></p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Fnotes%252Fmdp.html">Chapter 3: Markov Decision Processes</a></p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Fnotes%252Fdynamic.html">Chapter 4: Dynamic Programming</a></p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Fnotes%252Fmcmethods.html">Chapter 5: Monte Carlo Methods</a> </p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,94 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Readings for Lectures of Cluster Analysis</h1>
<h2>Lecture 1</h2>
<p>Chapter 1: What is Reinforcement Learning?</p>
<h2>Lecture 2</h2>
<p>Chapter 2: Multi-armed Bandits</p>
<h2>Lecture 3</h2>
<p>Chpater 3: Finite Markov Decision Processes Part 1</p>
<h2>Lecture 4</h2>
<p>Chapter 3: Finite Markov Decision Processes Part 2</p>
<h2>Lecture 5</h2>
<p>[No Readings] Playing around with Multi-armed Bandits Code</p>
<p><strong>Lost track of readings around this time period :(</strong></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,144 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Reinforcement Learning</h1>
<p>The goal of this independent study is to gain an introduction to the topic of Reinforcement Learning. </p>
<p>As such the majority of the semester will be following the textbook to gain an introduction to the topic, and the last part applying it to some problems.</p>
<h2>Textbook</h2>
<p>The majority of the content of this independent study will come from the textbook. This is meant to lessen the burden on the both us of as I already experimented with curating my own content.</p>
<p>The textbook also includes examples throughout the text to immediately apply what's learned.</p>
<p>Richard S. Sutton and Andrew G. Barto, &quot;Reinforcement Learning: An Introduction&quot; <a href="http://incompleteideas.net/book/bookdraft2017nov5.pdf">http://incompleteideas.net/book/bookdraft2017nov5.pdf</a></p>
<h2>Discussions and Notes</h2>
<p>Discussions and notes will be kept track of and published on my tilda space as time and energy permits. This is for easy reference and since it's nice to write down what you learn.</p>
<h2>Topics to be Discussed</h2>
<h3>The Reinforcement Learning Problem (3 Sessions)</h3>
<p>In this section we will get ourselves familiar with the topics that are commonly discussed in Reinforcement learning problems.</p>
<p>In this section we will learn the different vocab terms such as:</p>
<ul>
<li>Evaluative Feedback </li>
<li>Non-Associative Learning</li>
<li>Rewards/Returns</li>
<li>Value Functions</li>
<li>Optimality</li>
<li>Exploration/Exploitation</li>
<li>Model</li>
<li>Policy</li>
<li>Value Function</li>
<li>Multi-armed Bandit Problem</li>
</ul>
<h3>Markov Decision Processes (4 Sessions)</h3>
<p>This is a type of reinforcement learning problem that is commonly studied and well documented. This helps form an environment for which the agent can operate within. Possible subtopics include:</p>
<ul>
<li>Finite Markov Decision Processes</li>
<li>Goals and Rewards</li>
<li>Returns and Episodes</li>
<li>Optimality and Approximation</li>
</ul>
<h3>Dynamic Programming (3 Sessions)</h3>
<p>Dynamic Programming refers to a collection of algorithms that can be used to compute optimal policies given an environment. Subtopics that we are going over is:</p>
<ul>
<li>Policy Evaluation</li>
<li>Policy Improvement</li>
<li>Policy Iteration</li>
<li>Value Iteration</li>
<li>Asynchronous DP</li>
<li>Generalized policy Iteration </li>
<li>Bellman Expectation Equations</li>
</ul>
<h3>Monte Carlo Methods (3 Sessions)</h3>
<p>Now we move onto not having complete knowledge of the environment. This will go into estimating value functions and discovering optimal policies. Possible subtopics include:</p>
<ul>
<li>Monte Carlo Prediction</li>
<li>Monte Carlo Control</li>
<li>Importance Sampling</li>
<li>Incremental Implementation</li>
<li>Off-Policy Monte Carlo Control</li>
</ul>
<h3>Temporal-Difference Learning (4-5 Sessions)</h3>
<p>Temporal-Difference learning is a combination of Monte Carlo ideas and Dynamic Programming. This can lead to methods learning directly from raw experience without knowledge of an environment. Subtopics will include:</p>
<ul>
<li>TD Prediction</li>
<li>Sarsa: On-Policy TD Control</li>
<li>Q-Learning: Off-Policy TD Control</li>
<li>Function Approximation</li>
<li>Eligibility Traces</li>
</ul>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,101 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<title>Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Reinforcement Learning</h1>
<p>Reinforcement learning is the art of analyzing situations and mapping them to actions in order to maximize a numerical reward signal.</p>
<p>In this independent study, I as well as Dr. Stephen Davies, will explore the Reinforcement Learning problem and its subproblems. We will go over the bandit problem, markov decision processes, and discover how best to translate a problem in order to <strong>make decisions</strong>.</p>
<p>I have provided a list of topics that I wish to explore in a <a href="index.html%3Fresearch%252FReinforcementLearning%252Fsyllabus.html">syllabus</a></p>
<h2>Readings</h2>
<p>In order to spend more time learning, I decided to follow a textbook this time. </p>
<p>Reinforcement Learning: An Introduction</p>
<p>By Richard S. Sutton and Andrew G. Barto</p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Freadings.html">Reading Schedule</a> </p>
<h2>Notes</h2>
<p>The notes for this course, is going to be an extreemly summarized version of the textbook. There will also be notes on whatever side tangents Dr. Davies and I explore.</p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning%252Fnotes.html">Notes page</a></p>
<p>I wrote a small little quirky/funny report describing the bandit problem. Great for learning about the common considerations for Reinforcement Learning problems.</p>
<p><a href="/files/research/TheBanditReport.pdf">The Bandit Report</a></p>
<h2>Code</h2>
<p>Code will occasionally be written to solidify the learning material and to act as aids for more exploration. </p>
<p><a href="https://github.com/brandon-rozek/ReinforcementLearning">Github Link</a></p>
<p>Specifically, if you want to see agents I've created to solve some OpenAI environments, take a look at this specific folder in the Github Repository</p>
<p><a href="https://github.com/Brandon-Rozek/ReinforcementLearning/tree/master/agents">Github Link</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,110 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="A list of my research Projects">
<title>Research | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem active">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Research</h1>
<h2>Computer Science / Data Science Research</h2>
<h3>Reinforcement Learning</h3>
<p>Deep Reinforcement Learning: With Dr. Zacharski I focused more on a particular instance of Reinforcement Learning where deep neural networks are used. During this time, I built out a Reinforcement Learning library written in PyTorch. This library helps me have a test bed for trying out different algorithms and attempts to create my own.</p>
<p><a href="https://github.com/brandon-rozek/rltorch">Github Code</a></p>
<p><a href="/files/research/QEP.pptx">QEP Algorithm Slides</a></p>
<p>Reinforcement Learning: Currently studying the fundamentals of reinforcement learning with Dr. Davies. We went over the fundamentals such as value functions, policy functions, how we can describe our environment as a markov decision processes, etc.</p>
<p><a href="index.html%3Fresearch%252FReinforcementLearning.html">Notes and Other Goodies</a></p>
<p><a href="https://github.com/brandon-rozek/ReinforcementLearning">Github Code</a></p>
<h3>Programming Languages</h3>
<p>Programming Languages: Studying the design of programming languages. So far I made an implementation of the SLOTH programming language, experimenting with what I want my own programming language to be syntatically and paradigm wise.</p>
<p><a href="https://github.com/brandon-rozek/SLOTH">SLOTH Code</a></p>
<p>Before this study, I worked though a book called &quot;Build your own Lisp&quot; and my implementation of a lisp like language is below.</p>
<p><a href="https://github.com/brandon-rozek/lispy">Lispy Code</a></p>
<h3>Other</h3>
<p>Competitive Programming: Studying algorithms and data structures necessary for competitive programming. Attending ACM ICPC in November with a team of two other students.</p>
<h2>Math/Statistics Research</h2>
<p>Worked on an independent study on the topic of <strong>Cluster Analysis</strong>. This is where you try to group similar observations without knowing what the labels are.
I am studying under the guidance of Dr. Melody Denhere, the link below gives you more of a description of the project along with my course notes.</p>
<p><a href="index.html%3Fresearch%252FClusterAnalysis.html">Cluster Analysis Spring 2018</a></p>
<h2>Physics Research</h2>
<p>For the two projects below, I worked on Quantum Research in a physics lab with a fellow student Hannah Killian and an advisor Dr. Hai Nguyen. I mostly assisted with the software support for the project and assisted in the mathematics in whatever way I can.</p>
<p><a href="/files/research/modellingpopulationdynamics.pdf">Modeling Population Dynamics of Incoherent and Coherent Excitation</a></p>
<p><a href="/files/research/coherentcontrolofatomicpopulation.pdf">Coherent Control of Atomic Population Using the Genetic Algorithm</a></p>
<p>In order to circumvent the frustrations I had with simulation code taking a while, I applied and received funding to build out a Beowulf cluster for the Physics department. Dr. Maia Magrakvilidze was the advisor for this project.</p>
<p><a href="/files/research/LUNAC.pdf">High Performance Cluster for Research and Education Report (nicknamed LUNA-C)</a></p>
<p><a href="/files/research/LUNACposter.pdf">LUNA-C Poster</a></p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

View file

@ -0,0 +1,146 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="robots" content="noindex" />
<meta name="description" content="Page listing the courses I've taken">
<title>Transcript | Brandon Rozek</title>
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>
<aside class="main-nav">
<nav>
<ul>
<li class="menuitem ">
<a href="index.html%3Findex.html" data-shortcut="">
Home
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fcourses.html" data-shortcut="">
Courses
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Flabaide.html" data-shortcut="">
Lab Aide
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fpresentations.html" data-shortcut="">
Presentations
</a>
</li>
<li class="menuitem ">
<a href="index.html%3Fresearch.html" data-shortcut="">
Research
</a>
</li>
<li class="menuitem active">
<a href="index.html%3Ftranscript.html" data-shortcut="">
Transcript
</a>
</li>
</ul>
</nav>
</aside>
<main class="main-content">
<article class="article">
<h1>Transcript</h1>
<p>Below is a list of courses I've taken in the past for credit. </p>
<h2>Spring 2019</h2>
<p>Deep Reinforcement Learning (CPSC 491)</p>
<p>Real Analysis I (MATH 471)</p>
<p>Abstract Algebra I (MATH 431)</p>
<p>Operating Systems (CPSC 405)</p>
<p>Digital Storytelling (CPSC 106)</p>
<h2>Fall 2018</h2>
<p>Reinforcement Learning (CPSC 491)</p>
<p>Programming Languages (CPSC 491)</p>
<p>Competitive Programming (CPSC 491)</p>
<p>Multivariate Statistics (STAT 461)</p>
<p>Foundations of Advance Mathematics (MATH 330)</p>
<p>Applications of Databases (CPSC 350)</p>
<h2>Spring 2018</h2>
<p>Foundations for Data Science (Data 219)</p>
<p>Theory of Computation (CPSC 326)</p>
<p>Data Structures and Algorithms (CPSC 340)</p>
<p>Differential Equations (MATH 312)</p>
<p>Cluster Analysis (STAT 491)</p>
<h2>Fall 2017</h2>
<p>Computer Systems and Architecture (CPSC 305)</p>
<p>Introduction to Data Science (DATA 101)</p>
<p>Linear Algebra (MATH 300)</p>
<p>Methods in Mathematical Physics (PHYS 317)</p>
<p>Probability &amp; Statistical Inference I (STAT 381)</p>
<p>Undergraduate Research in Physics (URES 197J)</p>
<h2>Fall 2017</h2>
<p>Computer Systems and Architecture (CPSC 305)</p>
<p>Introduction to Data Science (DATA 101)</p>
<p>Linear Algebra (MATH 300)</p>
<p>Methods in Mathematical Physics (PHYS 317)</p>
<p>Probability &amp; Statistical Inference I (STAT 381)</p>
<p>Undergraduate Research in Physics (URES 197J)</p>
<h2>Fall 2017</h2>
<p>Computer Systems and Architecture (CPSC 305)</p>
<p>Introduction to Data Science (DATA 101)</p>
<p>Linear Algebra (MATH 300)</p>
<p>Methods in Mathematical Physics (PHYS 317)</p>
<p>Probability &amp; Statistical Inference I (STAT 381)</p>
<p>Undergraduate Research in Physics (URES 197J)</p>
<h2>Spring 2017</h2>
<p>Intro to Discrete Mathematics (CPSC 125A)</p>
<p>Object-Oriented Analysis &amp; Design (CPSC 240)</p>
<p>Statistical Methods (MATH 280)</p>
<p>University Physics II, with Lab (PHYS 106)</p>
<p>Undergraduate Research in Physics (URES 197J)</p>
<h2>Fall 2016</h2>
<p>Software Development Tools (CPSC 225)</p>
<p>Numbers Rule Your World (FSEM 100M1)</p>
<p>Intro to Statistics (MATH 200)</p>
<p>American Music (MUHL 156)</p>
<p>University Physics I, with Lab (PHYS 105)</p>
<h2>Advance Placement</h2>
<p>Computer Programming and Problem Solving (CPSC 220)</p>
<p>Intro to Human Geography (GEOG 102)</p>
<p>American History to 1865 (HIST 131)</p>
<p>American History since 1865 (HIST 132)</p>
<p>Calculus I (MATH 121)</p>
<p>Calculus II (MATH 122)</p>
<p>Calculus III (MATH 223)</p>
<p>General Psychology (PSYCH 100)</p>
</article>
</main>
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
processEscapes: true
}
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
hljs.initHighlightingOnLoad();
document.querySelectorAll('.menuitem a').forEach(function(el) {
if (el.getAttribute('data-shortcut').length > 0) {
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
location.assign(el.getAttribute('href'));
});
}
});
</script>
</body>
</html>

Some files were not shown because too many files have changed in this diff Show more