mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-12-23 17:31:42 +00:00
1.2 KiB
1.2 KiB
title | showthedate | math |
---|---|---|
K-means++ | false | true |
K-means++ is an algorithm for choosing the initial values or seeds for the k-means clustering algorithm. This was proposed as a way of avoiding the sometimes poor clustering found by a standard k-means algorithm.
Intuition
The intuition behind this approach involves spreading out the k
initial cluster centers. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its squared distance from the point's closest existing cluster center.
Algorithm
The exact algorithm is as follows
- Choose one center uniformly at random from among data points
- For each data point
x
, computeD(x)
, the distance betweenx
and the nearest center that has already been chosen. - Choose one new data point at random as a new center, using a weighted probability distribution where a point
x
is chosen with probability proporitonal toD(x)^2
- Repeat steps 2 and 3 until
k
centers have been chosen - Now that the initial centers have been chosen, proceed using standard k-means clustering