An automatic method to determine the number of clusters using decision-theoretic rough set

Article ID	Journal	Published Year	Pages	File Type
397340	International Journal of Approximate Reasoning	2014	15 Pages	PDF

Abstract

•We research the problem for automatically determining the number of clusters.•We extend the decision-theoretic rough set model to apply to clustering.•A clustering validity evaluation function based on the risk is proposed.•The changes in the function are proved to find a right number of clusters.•The fast clustering algorithm uses no parameters defined in advance.

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. Determining the number of clusters in a data set is one of the most challenging and difficult problems in cluster analysis. To combat the problem, this paper proposes an efficient automatic method by extending the decision-theoretic rough set model to clustering. A new clustering validity evaluation function is designed based on the risk calculated by loss functions and possibilities. Then a hierarchical clustering algorithm, ACA-DTRS algorithm, is proposed, which is proved to stop automatically at the perfect number of clusters without manual interference. Furthermore, a novel fast algorithm, FACA-DTRS, is devised based on the conclusion obtained in the validation of the ACA-DTRS algorithm. The performance of algorithms has been studied on some synthetic and real world data sets. The algorithm analysis and the results of comparison experiments show that the new method, without manual parameter specified in advance, is more valid to determine the number of clusters and more efficient in terms of time cost.