کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
530219 | 869750 | 2015 | 12 صفحه PDF | دانلود رایگان |
• A parameterized model for the clustering error is introduced.
• The model parameter is a measure of the data dimension and homogeneity.
• A new cost criterion is derived from the properties of the model.
• The method demonstrates good results for numerical data sets.
In this paper, we consider the problem of unsupervised clustering (vector quantization) of multidimensional numerical data. We propose a new method for determining an optimal number of clusters in the data set. The method is based on parametric modeling of the quantization error. The model parameter can be treated as the effective dimensionality of the data set. The proposed method was tested with artificial and real numerical data sets and the results of the experiments demonstrate empirically not only the effectiveness of the method but its ability to cope with difficult cases where other known methods fail.
Journal: Pattern Recognition - Volume 48, Issue 3, March 2015, Pages 941–952