mirror of
https://github.com/Brandon-Rozek/website.git
synced 2025-10-10 15:01:15 +00:00
Website snapshot
This commit is contained in:
parent
ee0ab66d73
commit
50ec3688a5
281 changed files with 21066 additions and 0 deletions
19
content/research/clusteranalysis/notes/lec11-3.md
Normal file
19
content/research/clusteranalysis/notes/lec11-3.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
# K-Medians
|
||||
|
||||
This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead.
|
||||
|
||||
This has the effect of minimizing error over all the clusters with respect to the Manhattan norm as opposed to the Euclidean squared norm which is minimized in K-means
|
||||
|
||||
### Algorithm
|
||||
|
||||
Given an initial set of $k$ medians, the algorithm proceeds by alternating between two steps.
|
||||
|
||||
**Assignment step**: Assign each observation to the cluster whose median has the leas Manhattan distance.
|
||||
|
||||
- Intuitively this is finding the nearest median
|
||||
|
||||
**Update Step**: Calculate the new medians to be the centroids of the observations in the new clusters
|
||||
|
||||
The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm.
|
||||
|
||||
The result depends on the initial clusters. It is common to run this multiple times with different starting conditions.
|
Loading…
Add table
Add a link
Reference in a new issue