Article ID Journal Published Year Pages File Type
397340 International Journal of Approximate Reasoning 2014 15 Pages PDF
Abstract

•We research the problem for automatically determining the number of clusters.•We extend the decision-theoretic rough set model to apply to clustering.•A clustering validity evaluation function based on the risk is proposed.•The changes in the function are proved to find a right number of clusters.•The fast clustering algorithm uses no parameters defined in advance.

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. Determining the number of clusters in a data set is one of the most challenging and difficult problems in cluster analysis. To combat the problem, this paper proposes an efficient automatic method by extending the decision-theoretic rough set model to clustering. A new clustering validity evaluation function is designed based on the risk calculated by loss functions and possibilities. Then a hierarchical clustering algorithm, ACA-DTRS algorithm, is proposed, which is proved to stop automatically at the perfect number of clusters without manual interference. Furthermore, a novel fast algorithm, FACA-DTRS, is devised based on the conclusion obtained in the validation of the ACA-DTRS algorithm. The performance of algorithms has been studied on some synthetic and real world data sets. The algorithm analysis and the results of comparison experiments show that the new method, without manual parameter specified in advance, is more valid to determine the number of clusters and more efficient in terms of time cost.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence