<p>Probabilities must be between zero and one, i.e., $0≤P(A)≤1$ for any event A.</p>
<p>Probabilities add to one, i.e., $\sum{P(X_i)} = 1$</p>
<p>The complement of an event, $A^c$, denotes that the event did not happen. Since probabilities must add to one, $P(A^c) = 1 - P(A)$</p>
<p>If A and B are two events, the probability that A or B happens (this is an inclusive or) is the probability of the union of the events:
$$
P(A \cup B) = P(A) + P(B) - P(A\cap B)
$$
where $\cup$ represents union ("or") and $\cap$ represents intersection ("and"). If a set of events $A<em>i$ are mutually exclusive (only one event may happen), then
<p>The expected value of a random variable X is a weighted average of values X can take, with weights given by the probabilities of those values.
$$
E(X) = \sum_{i=1}^n{x_i * P(X=x_i)}
$$</p>
<h2>Frameworks of probability</h2>
<p>Classical -- Outcomes that are equally likely have equal probabilities</p>
<p>Frequentist -- In an infinite sequence of events, what is the relative frequency</p>
<p>Bayesian -- Personal perspective (your own measure of uncertainty)</p>
<p>In betting, one must make sure that all the rules of probability are followed. That the events are "coherent", otherwise one might construct a series of bets where you're guaranteed to lose money. This is referred to as a Dutch book.</p>
<h2>Conditional probability</h2>
<p>$$
P(A|B) = \frac{P(A\cup B)}{P(B)}
$$</p>
<p>Where $A|B$ denotes "A given B"</p>
<p>Example from lecture:</p>
<p>Suppose there are 30 students, 9 of which are female. From the 30 students, 12 are computer science majors. 4 of those 12 computer science majors are female
<p>An intuitive way to think about a conditional probability is that we're looking at a subsegment of the original population, and asking a probability question within that segment
<p>We'll be first studying the Bernoulli Distribution. This is when your event has two outcomes, which is commonly referred to as a success outcome and a failure outcome. The probability of success is $p$ which means the probability of failure is $(1-p)$
$$
X \sim B(p)
$$</p>
<p>$$
P(X = 1) = p
$$</p>
<p>$$
P(X = 0) = 1-p
$$</p>
<p>The probability of a random variable $X$ taking some value $x$ given $p$ is
$$
f(X = x | p) = f(x|p) = p^x(1-p)^{1 - x}I
$$
Where $I$ is the Heavenside function</p>
<p>Recall the expected value
$$
E(X) = \sum_{x_i}{x_iP(X=x_i)} = (1)p + (0)(1-p) = p
$$
We can also define the variance of Bernoulli
$$
Var(X) = p(1-p)
$$</p>
<h2>Binomial Distribution</h2>
<p>The binomial distribution is the sum of n <em>independent</em> Bernoulli trials
$$
X \sim Bin(n, p)
$$</p>
<p>$$
P(X=x|p) = f(x|p) = {n \choose x} p^x (1-p)^{n-x}
$$</p>
<p>$n\choose x$ is the combinatoric term which is defined as
<p>The geometric distribution is the number of trails needed to get the first success, i.e, the number of Bernoulli events until a success is observed.
$$
X \sim Geo(p)
$$</p>
<p>$$
P(X = x|p) = p(1-p)^{x-1}
$$</p>
<p>$$
E(X) = \frac{1}{p}
$$</p>
<h2>Multinomial Distribution (Discrete)</h2>
<p>Multinomial is like a binomial when there are more than two possible outcomes.</p>
<p>The Poisson distribution is used for counts. The parameter $\lambda > 0$ is the rate at which we expect to observe the thing we are counting.
$$
X \sim Pois(\lambda)
$$</p>
<p>$$
P(X=x|\lambda) = \frac{\lambda^xe^{-\lambda}}{x!}
$$</p>
<p>$$
E(X) = \lambda
$$</p>
<p>$$
Var(X) = \lambda
$$</p>
<h2>Gamma Distribution (Continuous)</h2>
<p>If $X_1, X_2, ..., X_n$ are independent and identically distributed Exponentials,waiting time between success events, then the total waiting time for all $n$ events to occur will follow a gamma distribution with shape parameter $\alpha = n$ and rate parameter $\beta = \lambda$
<p>Where $\Gamma(x)$ is the gamma function. The exponential distribution is a special case of the gamma distribution with $\alpha = 1$. As $\alpha$ increases, the gamma distribution more closely resembles the normal distribution.</p>
<h2>Beta Distribution (Continuous)</h2>
<p>The beta distribution is used for random variables which take on values between 0 and 1. For this reason, the beta distribution is commonly used to model probabilities.