Website snapshot

2026-01-30 14:03:39 +00:00 · 2020-01-15 21:51:49 -05:00 · 2020-01-15 21:51:49 -05:00 · 50ec3688a5
commit 50ec3688a5
parent ee0ab66d73
281 changed files with 21066 additions and 0 deletions
--- a/content/research/clusteranalysis/notes/lec4.md
+++ b/content/research/clusteranalysis/notes/lec4.md
@ -0,0 +1,171 @@
+# Principal Component Analysis Part 2: Formal Theory
+
+##Properties of PCA
+
+There are a number of ways to maximize the variance of a principal component. To create an unique solution we should impose a constraint. Let us say that the sum of the square of the coefficients must equal 1. In vector notation this is the same as
+$$
+a_i^Ta_i = 1
+$$
+Every future principal component is said to be orthogonal to all the principal components previous to it. 
+$$
+a_j^Ta_i = 0, i < j
+$$
+The total variance of the $q$ principal components will equal the total variance of the original variables
+$$
+\sum_{i = 1}^q {\lambda_i} = trace(S)
+$$
+Where $S$ is the sample covariance matrix.
+
+The proportion of accounted variation in each principle component is
+$$
+P_j = \frac{\lambda_j}{trace(S)}
+$$
+From this, we can generalize to the first $m$ principal components where $m < q$ and find the proportion $P^{(m)}$ of variation accounted for
+$$
+P^{(m)} = \frac{\sum_{i = 1}^m{\lambda_i}}{trace(S)}
+$$
+You can think of the first principal component as the line of best fit that minimizes the residuals orthogonal to it.
+
+### What to watch out for
+
+As a reminder to the last lecture, *PCA is not scale-invariant*. Therefore, transformations done to the dataset before PCA and after PCA often lead to different results and possibly conclusions.
+
+Additionally, if there are large differences between the variances of the original variables, then those whose variances are largest will tend to dominate the early components.
+
+Therefore, principal components should only be extracted from the sample covariance matrix when all of the original variables have roughly the **same scale**.
+
+### Alternatives to using the Covariance Matrix
+
+But it is rare in practice to have a scenario when all of the variables are of the same scale. Therefore, principal components are typically extracted from the **correlation matrix** $R$
+
+Choosing to work with the correlation matrix rather than the covariance matrix treats the variables as all equally important when performing PCA.
+
+##  Example Derivation: Bivariate Data
+
+Let $R$ be the correlation matrix
+$$
+R = \begin{pmatrix}
+1 & r \\
+r & 1
+\end{pmatrix}
+$$
+Let us find the eigenvectors and eigenvalues of the correlation matrix
+$$
+det(R - \lambda I) = 0
+$$
+
+$$
+(1-\lambda)^2 - r^2 = 0
+$$
+
+$$
+\lambda_1 = 1 + r, \lambda_2 = 1 - r
+$$
+
+Let us remember to check the condition "sum of the principal components equals the trace of the correlation matrix":
+$$
+\lambda_1 + \lambda_2 = 1+r + (1 - r) = 2 = trace(R)
+$$
+
+###Finding the First Eigenvector
+
+Looking back at the characteristic equation
+$$
+Ra_1 = \lambda a_1
+$$
+We can get the following two formulas
+$$
+a_{11} + ra_{12} = (1+r)a_{11} \tag{1}
+$$
+
+$$
+ra_{11} + a_{12} = (1 + r)a_{12} \tag{2}
+$$
+
+Now let us find out what $a_{11}$ and $a_{12}$ equal. First let us solve for $a_{11}$ using equation $(1)$
+$$
+ra_{12} = (1+r)a_{11} - a_{11}
+$$
+
+$$
+ra_{12} = a_{11}(1 + r - 1)
+$$
+
+$$
+ra_{12} = ra_{11}
+$$
+
+$$
+a_{12} = a_{11}
+$$
+
+Where $r$ does not equal $0$.
+
+Now we must apply the condition of sum squares
+$$
+a_1^Ta_1 = 1
+$$
+
+$$
+a_{11}^2 + a_{12}^2  = 1
+$$
+
+Recall that $a_{12} = a_{11}$ 
+$$
+2a_{11}^2 = 1
+$$
+
+$$
+a_{11}^2 = \frac{1}{2}
+$$
+
+$$
+a_{11} =\pm \frac{1}{\sqrt{2}}
+$$
+
+For sake of choosing a value, let us take the principal root and say $a_{11} = \frac{1}{\sqrt{2}}$
+
+###Finding the Second Eigenvector
+
+Recall the fact that each subsequent eigenvector is orthogonal to the first. This means
+$$
+a_{11}a_{21} + a_{12}a_{22} = 0
+$$
+Substituting the values for $a_{11}$ and $a_{12}$ calculated in the previous section
+$$
+\frac{1}{\sqrt{2}}a_{21} + \frac{1}{\sqrt{2}}a_{22} = 0
+$$
+
+$$
+a_{21} + a_{22} = 0
+$$
+
+$$
+a_{21} = -a_{22}
+$$
+
+Since this eigenvector also needs to satisfy the first condition, we get the following values
+$$
+a_{21} = \frac{1}{\sqrt{2}} , a_{22} = \frac{-1}{\sqrt{2}}
+$$
+
+### Conclusion of Example
+
+From this, we can say that the first principal components are given by
+$$
+y_1 = \frac{1}{\sqrt{2}}(x_1 + x_2), y_2 = \frac{1}{\sqrt{2}}(x_1-x_2)
+$$
+With the variance of the first principal component being given by $(1+r)$ and the second by $(1-r)$
+
+Due to this, as $r$ increases, so does the variance explained in the first principal component. This in turn, lowers the variance explained in the second principal component.
+
+## Choosing a Number of Principal Components
+
+Principal Component Analysis is typically used in dimensionality reduction efforts. Therefore, there are several strategies for picking the right number of principal components to keep. Here are a few:
+
+- Retain enough principal components to account for 70%-90% of the variation
+- Exclude principal components where eigenvalues are less than the average eigenvalue
+- Exclude principal components where eigenvalues are less than one.
+- Generate a Scree Plot
+  - Stop when the plot goes from "steep" to "shallow"
+  - Stop when it essentially becomes a straight line.