A method for initialising the K-means clustering algorithm using kd-trees

Article ID	Journal	Published Year	Pages	File Type
536814	Pattern Recognition Letters	2007	9 Pages	PDF

Abstract

We present a method for initialising the K-means clustering algorithm. Our method hinges on the use of a kd-tree to perform a density estimation of the data at various locations. We then use a modification of Katsavounidis’ algorithm, which incorporates this density information, to choose K seeds for the K-means algorithm. We test our algorithm on 36 synthetic datasets, and 2 datasets from the UCI Machine Learning Repository, and compare with 15 runs of Forgy’s random initialisation method, Katsavounidis’ algorithm, and Bradley and Fayyad’s method.

Keywords

kd-tree k-means algorithm Clustering