mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-11-26 01:53:58 -05:00
23 lines
1 KiB
Markdown
23 lines
1 KiB
Markdown
---
|
|
title: K-Medians
|
|
showthedate: false
|
|
math: true
|
|
---
|
|
|
|
This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead.
|
|
|
|
This has the effect of minimizing error over all the clusters with respect to the Manhattan norm as opposed to the Euclidean squared norm which is minimized in K-means
|
|
|
|
### Algorithm
|
|
|
|
Given an initial set of $k$ medians, the algorithm proceeds by alternating between two steps.
|
|
|
|
**Assignment step**: Assign each observation to the cluster whose median has the leas Manhattan distance.
|
|
|
|
- Intuitively this is finding the nearest median
|
|
|
|
**Update Step**: Calculate the new medians to be the centroids of the observations in the new clusters
|
|
|
|
The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm.
|
|
|
|
The result depends on the initial clusters. It is common to run this multiple times with different starting conditions.
|