کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
535323 | 870340 | 2006 | 13 صفحه PDF | دانلود رایگان |
This paper deals with the problem of clustering categorical datasets. Categorical data typically suffer from limited measuring levels and exhibit sparsity in a space of very high dimension. Conventional dissimilarity measures are, therefore, inadequate. We propose a new clustering algorithm based on projected clustering. The proposed algorithm, although hierarchical in essence, avoids the characteristic error propagation through reassignment and deletion of bad clusters. We also propose new indices for cluster validation in categorical datasets, an area that is almost unexplored. We present techniques for finding optimal number of clusters, and for initialization of centers of clusters. Experimental results demonstrate the effectiveness of the proposed clustering algorithm. The cluster validation for categorical datasets is also shown to be quite efficient.
Journal: Pattern Recognition Letters - Volume 27, Issue 12, September 2006, Pages 1405–1417