Fast density clustering strategies based on the k-means algorithm

Article ID	Journal	Published Year	Pages	File Type
4969542	Pattern Recognition	2017	12 Pages	PDF

Abstract

Clustering by fast search and find of density peaks (CFSFDP) is a state-of-the-art density-based clustering algorithm that can effectively find clusters with arbitrary shapes. However, it requires to calculate the distances between all the points in a data set to determine the density and separation of each point. Consequently, its computational cost is extremely high in the case of large-scale data sets. In this study, we investigate the application of the k-means algorithm, which is a fast clustering technique, to enhance the scalability of the CFSFDP algorithm while maintaining its clustering results as far as possible. Toward this end, we propose two strategies. First, based on concept approximation, an acceleration algorithm (CFSFDP+A) involving fewer distance calculations is proposed to obtain the same clustering results as those of the original algorithm. Second, to further expand the scalability of the original algorithm, an approximate algorithm (CFSFDP+DE) based on exemplar clustering is proposed to rapidly obtain approximate clustering results of the original algorithm. Finally, experiments are conducted to illustrate the effectiveness and scalability of the proposed algorithms on several synthetic and real data sets.

Keywords

Cluster analysis Approximate algorithm Density-based clustering k-Means