Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
379409 | Data & Knowledge Engineering | 2007 | 25 Pages |
Abstract
Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Amir Ahmad, Lipika Dey,