Article ID Journal Published Year Pages File Type
4944453 Information Sciences 2017 46 Pages PDF
Abstract
A centroid-based clustering algorithm is proposed that works in a totally unsupervised fashion and is significantly faster and more accurate than existing algorithms. The algorithm, named CLUBS+ (for CLustering Using Binary Splitting), achieves these results by combining features of hierarchical and partition-based algorithms. Thus, CLUBS+ consists of two major phases, i.e., a divisive phase and an agglomerative phase, each followed by a refinement phase. Each major phase consists of successive steps in which the samples are repartitioned using a criterion based on least quadratic distance. This criterion possesses unique analytical properties that are elucidated in the paper and exploited by the algorithm to achieve a very fast computation. The paper presents the results of the extensive experiments performed: these confirm that the new algorithm is fast, impervious to noise, and produces results of better quality than other algorithms, such as BOOL, BIRCH, and k-means++, even when the analyst can determine the correct number of clusters-a very difficult task from which users are spared by CLUBS+.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,