Article ID Journal Published Year Pages File Type
530219 Pattern Recognition 2015 12 Pages PDF
Abstract

•A parameterized model for the clustering error is introduced.•The model parameter is a measure of the data dimension and homogeneity.•A new cost criterion is derived from the properties of the model.•The method demonstrates good results for numerical data sets.

In this paper, we consider the problem of unsupervised clustering (vector quantization) of multidimensional numerical data. We propose a new method for determining an optimal number of clusters in the data set. The method is based on parametric modeling of the quantization error. The model parameter can be treated as the effective dimensionality of the data set. The proposed method was tested with artificial and real numerical data sets and the results of the experiments demonstrate empirically not only the effectiveness of the method but its ability to cope with difficult cases where other known methods fail.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,