website/content/research/clusteranalysis/notes/lec11-3.md

24 lines
1 KiB
Markdown
Raw Normal View History

---
title: K-Medians
showthedate: false
math: true
---
2020-01-16 02:51:49 +00:00
This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead.
This has the effect of minimizing error over all the clusters with respect to the Manhattan norm as opposed to the Euclidean squared norm which is minimized in K-means
### Algorithm
Given an initial set of $k$ medians, the algorithm proceeds by alternating between two steps.
**Assignment step**: Assign each observation to the cluster whose median has the leas Manhattan distance.
- Intuitively this is finding the nearest median
**Update Step**: Calculate the new medians to be the centroids of the observations in the new clusters
The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm.
The result depends on the initial clusters. It is common to run this multiple times with different starting conditions.