mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-11-22 08:16:29 -05:00
218 lines
7.6 KiB
HTML
218 lines
7.6 KiB
HTML
|
<!DOCTYPE html>
|
||
|
<html>
|
||
|
<head>
|
||
|
<meta charset="utf-8" />
|
||
|
<meta name="author" content="Fredrik Danielsson, http://lostkeys.se">
|
||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
|
<meta name="robots" content="noindex" />
|
||
|
<title>Brandon Rozek</title>
|
||
|
<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
|
||
|
<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
|
||
|
</head>
|
||
|
<body>
|
||
|
|
||
|
<aside class="main-nav">
|
||
|
<nav>
|
||
|
<ul>
|
||
|
<li class="menuitem ">
|
||
|
<a href="index.html%3Findex.html" data-shortcut="">
|
||
|
Home
|
||
|
</a>
|
||
|
</li>
|
||
|
<li class="menuitem ">
|
||
|
<a href="index.html%3Fcourses.html" data-shortcut="">
|
||
|
Courses
|
||
|
</a>
|
||
|
</li>
|
||
|
<li class="menuitem ">
|
||
|
<a href="index.html%3Flabaide.html" data-shortcut="">
|
||
|
Lab Aide
|
||
|
</a>
|
||
|
</li>
|
||
|
<li class="menuitem ">
|
||
|
<a href="index.html%3Fpresentations.html" data-shortcut="">
|
||
|
Presentations
|
||
|
</a>
|
||
|
</li>
|
||
|
<li class="menuitem ">
|
||
|
<a href="index.html%3Fresearch.html" data-shortcut="">
|
||
|
Research
|
||
|
</a>
|
||
|
</li>
|
||
|
<li class="menuitem ">
|
||
|
<a href="index.html%3Ftranscript.html" data-shortcut="">
|
||
|
Transcript
|
||
|
</a>
|
||
|
</li>
|
||
|
</ul>
|
||
|
</nav>
|
||
|
</aside>
|
||
|
<main class="main-content">
|
||
|
<article class="article">
|
||
|
<h1>Principal Component Analysis Part 2: Formal Theory</h1>
|
||
|
<h2>Properties of PCA</h2>
|
||
|
<p>There are a number of ways to maximize the variance of a principal component. To create an unique solution we should impose a constraint. Let us say that the sum of the square of the coefficients must equal 1. In vector notation this is the same as
|
||
|
$$
|
||
|
a_i^Ta_i = 1
|
||
|
$$
|
||
|
Every future principal component is said to be orthogonal to all the principal components previous to it.
|
||
|
$$
|
||
|
a_j^Ta<em>i = 0, i < j
|
||
|
$$
|
||
|
The total variance of the $q$ principal components will equal the total variance of the original variables
|
||
|
$$
|
||
|
\sum</em>{i = 1}^q {\lambda_i} = trace(S)
|
||
|
$$
|
||
|
Where $S$ is the sample covariance matrix.</p>
|
||
|
<p>The proportion of accounted variation in each principle component is
|
||
|
$$
|
||
|
P_j = \frac{\lambda<em>j}{trace(S)}
|
||
|
$$
|
||
|
From this, we can generalize to the first $m$ principal components where $m < q$ and find the proportion $P^{(m)}$ of variation accounted for
|
||
|
$$
|
||
|
P^{(m)} = \frac{\sum</em>{i = 1}^m{\lambda_i}}{trace(S)}
|
||
|
$$
|
||
|
You can think of the first principal component as the line of best fit that minimizes the residuals orthogonal to it.</p>
|
||
|
<h3>What to watch out for</h3>
|
||
|
<p>As a reminder to the last lecture, <em>PCA is not scale-invariant</em>. Therefore, transformations done to the dataset before PCA and after PCA often lead to different results and possibly conclusions.</p>
|
||
|
<p>Additionally, if there are large differences between the variances of the original variables, then those whose variances are largest will tend to dominate the early components.</p>
|
||
|
<p>Therefore, principal components should only be extracted from the sample covariance matrix when all of the original variables have roughly the <strong>same scale</strong>.</p>
|
||
|
<h3>Alternatives to using the Covariance Matrix</h3>
|
||
|
<p>But it is rare in practice to have a scenario when all of the variables are of the same scale. Therefore, principal components are typically extracted from the <strong>correlation matrix</strong> $R$</p>
|
||
|
<p>Choosing to work with the correlation matrix rather than the covariance matrix treats the variables as all equally important when performing PCA.</p>
|
||
|
<h2>Example Derivation: Bivariate Data</h2>
|
||
|
<p>Let $R$ be the correlation matrix
|
||
|
$$
|
||
|
R = \begin{pmatrix}
|
||
|
1 & r \
|
||
|
r & 1
|
||
|
\end{pmatrix}
|
||
|
$$
|
||
|
Let us find the eigenvectors and eigenvalues of the correlation matrix
|
||
|
$$
|
||
|
det(R - \lambda I) = 0
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
(1-\lambda)^2 - r^2 = 0
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
\lambda_1 = 1 + r, \lambda_2 = 1 - r
|
||
|
$$</p>
|
||
|
<p>Let us remember to check the condition "sum of the principal components equals the trace of the correlation matrix":
|
||
|
$$
|
||
|
\lambda_1 + \lambda_2 = 1+r + (1 - r) = 2 = trace(R)
|
||
|
$$</p>
|
||
|
<h3>Finding the First Eigenvector</h3>
|
||
|
<p>Looking back at the characteristic equation
|
||
|
$$
|
||
|
Ra_1 = \lambda a<em>1
|
||
|
$$
|
||
|
We can get the following two formulas
|
||
|
$$
|
||
|
a</em>{11} + ra<em>{12} = (1+r)a</em>{11} \tag{1}
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
ra<em>{11} + a</em>{12} = (1 + r)a_{12} \tag{2}
|
||
|
$$</p>
|
||
|
<p>Now let us find out what $a<em>{11}$ and $a</em>{12}$ equal. First let us solve for $a<em>{11}$ using equation $(1)$
|
||
|
$$
|
||
|
ra</em>{12} = (1+r)a<em>{11} - a</em>{11}
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
ra<em>{12} = a</em>{11}(1 + r - 1)
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
ra<em>{12} = ra</em>{11}
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
a<em>{12} = a</em>{11}
|
||
|
$$</p>
|
||
|
<p>Where $r$ does not equal $0$.</p>
|
||
|
<p>Now we must apply the condition of sum squares
|
||
|
$$
|
||
|
a_1^Ta_1 = 1
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
a<em>{11}^2 + a</em>{12}^2 = 1
|
||
|
$$</p>
|
||
|
<p>Recall that $a<em>{12} = a</em>{11}$
|
||
|
$$
|
||
|
2a_{11}^2 = 1
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
a_{11}^2 = \frac{1}{2}
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
a_{11} =\pm \frac{1}{\sqrt{2}}
|
||
|
$$</p>
|
||
|
<p>For sake of choosing a value, let us take the principal root and say $a_{11} = \frac{1}{\sqrt{2}}$</p>
|
||
|
<h3>Finding the Second Eigenvector</h3>
|
||
|
<p>Recall the fact that each subsequent eigenvector is orthogonal to the first. This means
|
||
|
$$
|
||
|
a<em>{11}a</em>{21} + a<em>{12}a</em>{22} = 0
|
||
|
$$
|
||
|
Substituting the values for $a<em>{11}$ and $a</em>{12}$ calculated in the previous section
|
||
|
$$
|
||
|
\frac{1}{\sqrt{2}}a<em>{21} + \frac{1}{\sqrt{2}}a</em>{22} = 0
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
a<em>{21} + a</em>{22} = 0
|
||
|
$$</p>
|
||
|
<p>$$
|
||
|
a<em>{21} = -a</em>{22}
|
||
|
$$</p>
|
||
|
<p>Since this eigenvector also needs to satisfy the first condition, we get the following values
|
||
|
$$
|
||
|
a<em>{21} = \frac{1}{\sqrt{2}} , a</em>{22} = \frac{-1}{\sqrt{2}}
|
||
|
$$</p>
|
||
|
<h3>Conclusion of Example</h3>
|
||
|
<p>From this, we can say that the first principal components are given by
|
||
|
$$
|
||
|
y_1 = \frac{1}{\sqrt{2}}(x_1 + x_2), y_2 = \frac{1}{\sqrt{2}}(x_1-x_2)
|
||
|
$$
|
||
|
With the variance of the first principal component being given by $(1+r)$ and the second by $(1-r)$</p>
|
||
|
<p>Due to this, as $r$ increases, so does the variance explained in the first principal component. This in turn, lowers the variance explained in the second principal component.</p>
|
||
|
<h2>Choosing a Number of Principal Components</h2>
|
||
|
<p>Principal Component Analysis is typically used in dimensionality reduction efforts. Therefore, there are several strategies for picking the right number of principal components to keep. Here are a few:</p>
|
||
|
<ul>
|
||
|
<li>Retain enough principal components to account for 70%-90% of the variation</li>
|
||
|
<li>Exclude principal components where eigenvalues are less than the average eigenvalue</li>
|
||
|
<li>Exclude principal components where eigenvalues are less than one.</li>
|
||
|
<li>Generate a Scree Plot
|
||
|
<ul>
|
||
|
<li>Stop when the plot goes from "steep" to "shallow"</li>
|
||
|
<li>Stop when it essentially becomes a straight line.</li>
|
||
|
</ul></li>
|
||
|
</ul>
|
||
|
</article>
|
||
|
</main>
|
||
|
|
||
|
<script src="themes/bitsandpieces/scripts/highlight.js"></script>
|
||
|
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
|
||
|
<script type="text/x-mathjax-config">
|
||
|
MathJax.Hub.Config({
|
||
|
tex2jax: {
|
||
|
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
|
||
|
processEscapes: true
|
||
|
}
|
||
|
});
|
||
|
</script>
|
||
|
|
||
|
<script type="text/javascript"
|
||
|
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
|
||
|
</script>
|
||
|
<script>
|
||
|
hljs.initHighlightingOnLoad();
|
||
|
|
||
|
document.querySelectorAll('.menuitem a').forEach(function(el) {
|
||
|
if (el.getAttribute('data-shortcut').length > 0) {
|
||
|
Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
|
||
|
location.assign(el.getAttribute('href'));
|
||
|
});
|
||
|
}
|
||
|
});
|
||
|
</script>
|
||
|
|
||
|
</body>
|
||
|
</html>
|