کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
378713 659209 2014 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
WB-index: A sum-of-squares based index for cluster validity
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
WB-index: A sum-of-squares based index for cluster validity
چکیده انگلیسی


• Revisit on the sum-of-squares based index WB-index
• Analysis on three sum-of-squares based indices
• A systematic comparison among 12 internal indexes
• Employing three sum-of-squares based indices for automatic keyword categorization

Determining the number of clusters is an important part of cluster validity that has been widely studied in cluster analysis. Sum-of-squares based indices show promising properties in terms of determining the number of clusters. However, knee point detection is often required because most indices show monotonicity with increasing number of clusters. Therefore, indices with a clear minimum or maximum value are preferred. The aim of this paper is to revisit a sum-of-squares based index called the WB-index that has a minimum value as the determined number of clusters. We shed light on the relation between the WB-index and two popular indices which are the Calinski–Harabasz and the Xu-index. According to a theoretical comparison, the Calinski–Harabasz index is shown to be affected by the data size and level of data overlap. The Xu-index is close to the WB-index theoretically, however, it does not work well when the dimension of the data is greater than two. Here, we conduct a more thorough comparison of 12 internal indices and provide a summary of the experimental performance of different indices. Furthermore, we introduce the sum-of-squares based indices into automatic keyword categorization, where the indices are specially defined for determining the number of clusters.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 92, July 2014, Pages 77–89
نویسندگان
, ,