کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
489838 | 704634 | 2015 | 6 صفحه PDF | دانلود رایگان |

Clustering is the process of organizing dataset into isolated groups such that data points in the same are more similar and data points of different groups are more dissimilar. The k-modes algorithm well known for its simplicity is a popular partitioning algorithm for clustering categorical data. In this paper, we discuss the limitations of distance function used in this algorithm with an illustrative example and then we propose a similarity coefficient based on Information Entropy. We analyze the time complexity of the k-modes algorithm with proposed similarity coefficient. The main advantage of this coefficient is that it improves the clustering accuracy while retaining scalability of the k-modes algorithm. We perform the scalability tests on synthetic datasets.
Journal: Procedia Computer Science - Volume 50, 2015, Pages 93-98