website/static/~brozek/index.html?research%2FClusterAnalysis%2Fnotes%2Flec4.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8" />
  <meta name="author" content="Brandon Rozek">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="robots" content="noindex" />
    <title>Brandon Rozek</title>
  <link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />
  <link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />
</head>
<body>

<aside class="main-nav">
<nav>
  <ul>
          <li class="menuitem ">
        <a href="index.html%3Findex.html" data-shortcut="">
          Home
                  </a>
      </li>
          <li class="menuitem ">
        <a href="index.html%3Fcourses.html" data-shortcut="">
          Courses
                  </a>
      </li>
          <li class="menuitem ">
        <a href="index.html%3Flabaide.html" data-shortcut="">
          Lab Aide
                  </a>
      </li>
          <li class="menuitem ">
        <a href="index.html%3Fpresentations.html" data-shortcut="">
          Presentations
                  </a>
      </li>
          <li class="menuitem ">
        <a href="index.html%3Fresearch.html" data-shortcut="">
          Research
                  </a>
      </li>
          <li class="menuitem ">
        <a href="index.html%3Ftranscript.html" data-shortcut="">
          Transcript
                  </a>
      </li>
      </ul>
</nav>
</aside>
<main class="main-content">
  <article class="article">
    <h1>Principal Component Analysis Part 2: Formal Theory</h1>
<h2>Properties of PCA</h2>
<p>There are a number of ways to maximize the variance of a principal component. To create an unique solution we should impose a constraint. Let us say that the sum of the square of the coefficients must equal 1. In vector notation this is the same as
$$
a_i^Ta_i = 1
$$
Every future principal component is said to be orthogonal to all the principal components previous to it.
$$
a_j^Ta<em>i = 0, i &lt; j
$$
The total variance of the $q$ principal components will equal the total variance of the original variables
$$
\sum</em>{i = 1}^q {\lambda_i} = trace(S)
$$
Where $S$ is the sample covariance matrix.</p>
<p>The proportion of accounted variation in each principle component is
$$
P_j = \frac{\lambda<em>j}{trace(S)}
$$
From this, we can generalize to the first $m$ principal components where $m &lt; q$ and find the proportion $P^{(m)}$ of variation accounted for
$$
P^{(m)} = \frac{\sum</em>{i = 1}^m{\lambda_i}}{trace(S)}
$$
You can think of the first principal component as the line of best fit that minimizes the residuals orthogonal to it.</p>
<h3>What to watch out for</h3>
<p>As a reminder to the last lecture, <em>PCA is not scale-invariant</em>. Therefore, transformations done to the dataset before PCA and after PCA often lead to different results and possibly conclusions.</p>
<p>Additionally, if there are large differences between the variances of the original variables, then those whose variances are largest will tend to dominate the early components.</p>
<p>Therefore, principal components should only be extracted from the sample covariance matrix when all of the original variables have roughly the <strong>same scale</strong>.</p>
<h3>Alternatives to using the Covariance Matrix</h3>
<p>But it is rare in practice to have a scenario when all of the variables are of the same scale. Therefore, principal components are typically extracted from the <strong>correlation matrix</strong> $R$</p>
<p>Choosing to work with the correlation matrix rather than the covariance matrix treats the variables as all equally important when performing PCA.</p>
<h2>Example Derivation: Bivariate Data</h2>
<p>Let $R$ be the correlation matrix
$$
R = \begin{pmatrix}
1 &amp; r \
r &amp; 1
\end{pmatrix}
$$
Let us find the eigenvectors and eigenvalues of the correlation matrix
$$
det(R - \lambda I) = 0
$$</p>
<p>$$
(1-\lambda)^2 - r^2 = 0
$$</p>
<p>$$
\lambda_1 = 1 + r, \lambda_2 = 1 - r
$$</p>
<p>Let us remember to check the condition &quot;sum of the principal components equals the trace of the correlation matrix&quot;:
$$
\lambda_1 + \lambda_2 = 1+r + (1 - r) = 2 = trace(R)
$$</p>
<h3>Finding the First Eigenvector</h3>
<p>Looking back at the characteristic equation
$$
Ra_1 = \lambda a<em>1
$$
We can get the following two formulas
$$
a</em>{11} + ra<em>{12} = (1+r)a</em>{11} \tag{1}
$$</p>
<p>$$
ra<em>{11} + a</em>{12} = (1 + r)a_{12} \tag{2}
$$</p>
<p>Now let us find out what $a<em>{11}$ and $a</em>{12}$ equal. First let us solve for $a<em>{11}$ using equation $(1)$
$$
ra</em>{12} = (1+r)a<em>{11} - a</em>{11}
$$</p>
<p>$$
ra<em>{12} = a</em>{11}(1 + r - 1)
$$</p>
<p>$$
ra<em>{12} = ra</em>{11}
$$</p>
<p>$$
a<em>{12} = a</em>{11}
$$</p>
<p>Where $r$ does not equal $0$.</p>
<p>Now we must apply the condition of sum squares
$$
a_1^Ta_1 = 1
$$</p>
<p>$$
a<em>{11}^2 + a</em>{12}^2  = 1
$$</p>
<p>Recall that $a<em>{12} = a</em>{11}$
$$
2a_{11}^2 = 1
$$</p>
<p>$$
a_{11}^2 = \frac{1}{2}
$$</p>
<p>$$
a_{11} =\pm \frac{1}{\sqrt{2}}
$$</p>
<p>For sake of choosing a value, let us take the principal root and say $a_{11} = \frac{1}{\sqrt{2}}$</p>
<h3>Finding the Second Eigenvector</h3>
<p>Recall the fact that each subsequent eigenvector is orthogonal to the first. This means
$$
a<em>{11}a</em>{21} + a<em>{12}a</em>{22} = 0
$$
Substituting the values for $a<em>{11}$ and $a</em>{12}$ calculated in the previous section
$$
\frac{1}{\sqrt{2}}a<em>{21} + \frac{1}{\sqrt{2}}a</em>{22} = 0
$$</p>
<p>$$
a<em>{21} + a</em>{22} = 0
$$</p>
<p>$$
a<em>{21} = -a</em>{22}
$$</p>
<p>Since this eigenvector also needs to satisfy the first condition, we get the following values
$$
a<em>{21} = \frac{1}{\sqrt{2}} , a</em>{22} = \frac{-1}{\sqrt{2}}
$$</p>
<h3>Conclusion of Example</h3>
<p>From this, we can say that the first principal components are given by
$$
y_1 = \frac{1}{\sqrt{2}}(x_1 + x_2), y_2 = \frac{1}{\sqrt{2}}(x_1-x_2)
$$
With the variance of the first principal component being given by $(1+r)$ and the second by $(1-r)$</p>
<p>Due to this, as $r$ increases, so does the variance explained in the first principal component. This in turn, lowers the variance explained in the second principal component.</p>
<h2>Choosing a Number of Principal Components</h2>
<p>Principal Component Analysis is typically used in dimensionality reduction efforts. Therefore, there are several strategies for picking the right number of principal components to keep. Here are a few:</p>
<ul>
<li>Retain enough principal components to account for 70%-90% of the variation</li>
<li>Exclude principal components where eigenvalues are less than the average eigenvalue</li>
<li>Exclude principal components where eigenvalues are less than one.</li>
<li>Generate a Scree Plot
<ul>
<li>Stop when the plot goes from &quot;steep&quot; to &quot;shallow&quot;</li>
<li>Stop when it essentially becomes a straight line.</li>
</ul></li>
</ul>
  </article>
</main>

<script src="themes/bitsandpieces/scripts/highlight.js"></script>
<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/javascript"
    src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script>
  hljs.initHighlightingOnLoad();
  
  document.querySelectorAll('.menuitem a').forEach(function(el) {
    if (el.getAttribute('data-shortcut').length > 0) {
      Mousetrap.bind(el.getAttribute('data-shortcut'), function() {
        location.assign(el.getAttribute('href'));
      });       
    }
  });
</script>

</body>
</html>
Added old tildasite as an archive 2020-01-15 23:07:02 -05:00			`<!DOCTYPE html>`
			`<html>`
			`<head>`
			`<meta charset="utf-8" />`
Fixed meta tags 2022-02-15 01:14:58 -05:00			`<meta name="author" content="Brandon Rozek">`
Added old tildasite as an archive 2020-01-15 23:07:02 -05:00			`<meta name="viewport" content="width=device-width, initial-scale=1.0">`
			`<meta name="robots" content="noindex" />`
			`<title>Brandon Rozek</title>`
			`<link rel="stylesheet" href="themes/bitsandpieces/styles/main.css" type="text/css" />`
			`<link rel="stylesheet" href="themes/bitsandpieces/styles/highlightjs-github.css" type="text/css" />`
			`</head>`
			`<body>`

			`<aside class="main-nav">`
			`<nav>`
			`<ul>`
			`<li class="menuitem ">`
			`<a href="index.html%3Findex.html" data-shortcut="">`
			`Home`
			`</a>`
			`</li>`
			`<li class="menuitem ">`
			`<a href="index.html%3Fcourses.html" data-shortcut="">`
			`Courses`
			`</a>`
			`</li>`
			`<li class="menuitem ">`
			`<a href="index.html%3Flabaide.html" data-shortcut="">`
			`Lab Aide`
			`</a>`
			`</li>`
			`<li class="menuitem ">`
			`<a href="index.html%3Fpresentations.html" data-shortcut="">`
			`Presentations`
			`</a>`
			`</li>`
			`<li class="menuitem ">`
			`<a href="index.html%3Fresearch.html" data-shortcut="">`
			`Research`
			`</a>`
			`</li>`
			`<li class="menuitem ">`
			`<a href="index.html%3Ftranscript.html" data-shortcut="">`
			`Transcript`
			`</a>`
			`</li>`
			`</ul>`
			`</nav>`
			`</aside>`
			`<main class="main-content">`
			`<article class="article">`
			`<h1>Principal Component Analysis Part 2: Formal Theory</h1>`
			`<h2>Properties of PCA</h2>`
			`<p>There are a number of ways to maximize the variance of a principal component. To create an unique solution we should impose a constraint. Let us say that the sum of the square of the coefficients must equal 1. In vector notation this is the same as`
			`$$`
			`a_i^Ta_i = 1`
			`$$`
			`Every future principal component is said to be orthogonal to all the principal components previous to it.`
			`$$`
			`a_j^Ta<em>i = 0, i < j`
			`$$`
			`The total variance of the $q$ principal components will equal the total variance of the original variables`
			`$$`
			`\sum</em>{i = 1}^q {\lambda_i} = trace(S)`
			`$$`
			`Where $S$ is the sample covariance matrix.</p>`
			`<p>The proportion of accounted variation in each principle component is`
			`$$`
			`P_j = \frac{\lambda<em>j}{trace(S)}`
			`$$`
			`From this, we can generalize to the first $m$ principal components where $m < q$ and find the proportion $P^{(m)}$ of variation accounted for`
			`$$`
			`P^{(m)} = \frac{\sum</em>{i = 1}^m{\lambda_i}}{trace(S)}`
			`$$`
			`You can think of the first principal component as the line of best fit that minimizes the residuals orthogonal to it.</p>`
			`<h3>What to watch out for</h3>`
			`<p>As a reminder to the last lecture, <em>PCA is not scale-invariant</em>. Therefore, transformations done to the dataset before PCA and after PCA often lead to different results and possibly conclusions.</p>`
			`<p>Additionally, if there are large differences between the variances of the original variables, then those whose variances are largest will tend to dominate the early components.</p>`
			`<p>Therefore, principal components should only be extracted from the sample covariance matrix when all of the original variables have roughly the <strong>same scale</strong>.</p>`
			`<h3>Alternatives to using the Covariance Matrix</h3>`
			`<p>But it is rare in practice to have a scenario when all of the variables are of the same scale. Therefore, principal components are typically extracted from the <strong>correlation matrix</strong> $R$</p>`
			`<p>Choosing to work with the correlation matrix rather than the covariance matrix treats the variables as all equally important when performing PCA.</p>`
			`<h2>Example Derivation: Bivariate Data</h2>`
			`<p>Let $R$ be the correlation matrix`
			`$$`
			`R = \begin{pmatrix}`
			`1 & r \`
			`r & 1`
			`\end{pmatrix}`
			`$$`
			`Let us find the eigenvectors and eigenvalues of the correlation matrix`
			`$$`
			`det(R - \lambda I) = 0`
			`$$</p>`
			`<p>$$`
			`(1-\lambda)^2 - r^2 = 0`
			`$$</p>`
			`<p>$$`
			`\lambda_1 = 1 + r, \lambda_2 = 1 - r`
			`$$</p>`
			`<p>Let us remember to check the condition "sum of the principal components equals the trace of the correlation matrix":`
			`$$`
			`\lambda_1 + \lambda_2 = 1+r + (1 - r) = 2 = trace(R)`
			`$$</p>`
			`<h3>Finding the First Eigenvector</h3>`
			`<p>Looking back at the characteristic equation`
			`$$`
			`Ra_1 = \lambda a<em>1`
			`$$`
			`We can get the following two formulas`
			`$$`
			`a</em>{11} + ra<em>{12} = (1+r)a</em>{11} \tag{1}`
			`$$</p>`
			`<p>$$`
			`ra<em>{11} + a</em>{12} = (1 + r)a_{12} \tag{2}`
			`$$</p>`
			`<p>Now let us find out what $a<em>{11}$ and $a</em>{12}$ equal. First let us solve for $a<em>{11}$ using equation $(1)$`
			`$$`
			`ra</em>{12} = (1+r)a<em>{11} - a</em>{11}`
			`$$</p>`
			`<p>$$`
			`ra<em>{12} = a</em>{11}(1 + r - 1)`
			`$$</p>`
			`<p>$$`
			`ra<em>{12} = ra</em>{11}`
			`$$</p>`
			`<p>$$`
			`a<em>{12} = a</em>{11}`
			`$$</p>`
			`<p>Where $r$ does not equal $0$.</p>`
			`<p>Now we must apply the condition of sum squares`
			`$$`
			`a_1^Ta_1 = 1`
			`$$</p>`
			`<p>$$`
			`a<em>{11}^2 + a</em>{12}^2 = 1`
			`$$</p>`
			`<p>Recall that $a<em>{12} = a</em>{11}$`
			`$$`
			`2a_{11}^2 = 1`
			`$$</p>`
			`<p>$$`
			`a_{11}^2 = \frac{1}{2}`
			`$$</p>`
			`<p>$$`
			`a_{11} =\pm \frac{1}{\sqrt{2}}`
			`$$</p>`
			`<p>For sake of choosing a value, let us take the principal root and say $a_{11} = \frac{1}{\sqrt{2}}$</p>`
			`<h3>Finding the Second Eigenvector</h3>`
			`<p>Recall the fact that each subsequent eigenvector is orthogonal to the first. This means`
			`$$`
			`a<em>{11}a</em>{21} + a<em>{12}a</em>{22} = 0`
			`$$`
			`Substituting the values for $a<em>{11}$ and $a</em>{12}$ calculated in the previous section`
			`$$`
			`\frac{1}{\sqrt{2}}a<em>{21} + \frac{1}{\sqrt{2}}a</em>{22} = 0`
			`$$</p>`
			`<p>$$`
			`a<em>{21} + a</em>{22} = 0`
			`$$</p>`
			`<p>$$`
			`a<em>{21} = -a</em>{22}`
			`$$</p>`
			`<p>Since this eigenvector also needs to satisfy the first condition, we get the following values`
			`$$`
			`a<em>{21} = \frac{1}{\sqrt{2}} , a</em>{22} = \frac{-1}{\sqrt{2}}`
			`$$</p>`
			`<h3>Conclusion of Example</h3>`
			`<p>From this, we can say that the first principal components are given by`
			`$$`
			`y_1 = \frac{1}{\sqrt{2}}(x_1 + x_2), y_2 = \frac{1}{\sqrt{2}}(x_1-x_2)`
			`$$`
			`With the variance of the first principal component being given by $(1+r)$ and the second by $(1-r)$</p>`
			`<p>Due to this, as $r$ increases, so does the variance explained in the first principal component. This in turn, lowers the variance explained in the second principal component.</p>`
			`<h2>Choosing a Number of Principal Components</h2>`
			`<p>Principal Component Analysis is typically used in dimensionality reduction efforts. Therefore, there are several strategies for picking the right number of principal components to keep. Here are a few:</p>`
			`<ul>`
			`<li>Retain enough principal components to account for 70%-90% of the variation</li>`
			`<li>Exclude principal components where eigenvalues are less than the average eigenvalue</li>`
			`<li>Exclude principal components where eigenvalues are less than one.</li>`
			`<li>Generate a Scree Plot`
			`<ul>`
			`<li>Stop when the plot goes from "steep" to "shallow"</li>`
			`<li>Stop when it essentially becomes a straight line.</li>`
			`</ul></li>`
			`</ul>`
			`</article>`
			`</main>`

			`<script src="themes/bitsandpieces/scripts/highlight.js"></script>`
			`<script src="themes/bitsandpieces/scripts/mousetrap.min.js"></script>`
			`<script type="text/x-mathjax-config">`
			`MathJax.Hub.Config({`
			`tex2jax: {`
			`inlineMath: [ ['$','$'], ["\\(","\\)"] ],`
			`processEscapes: true`
			`}`
			`});`
			`</script>`

			`<script type="text/javascript"`
			`src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">`
			`</script>`
			`<script>`
			`hljs.initHighlightingOnLoad();`

			`document.querySelectorAll('.menuitem a').forEach(function(el) {`
			`if (el.getAttribute('data-shortcut').length > 0) {`
			`Mousetrap.bind(el.getAttribute('data-shortcut'), function() {`
			`location.assign(el.getAttribute('href'));`
			`});`
			`}`
			`});`
			`</script>`

			`</body>`
			`</html>`