کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
383595 660827 2013 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Cluster center initialization algorithm for K-modes clustering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Cluster center initialization algorithm for K-modes clustering
چکیده انگلیسی


• We address the problem of random initialization of centers of K-modes algorithm.
• We propose an algorithm that performs multiple clustering based on attribute values.
• The proposed algorithm yields fixed initial centers and has loglinear time complexity.
• The algorithm is compared with 3 methods of initialization and shown to perform better.

Partitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 18, 15 December 2013, Pages 7444–7456
نویسندگان
, ,