کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
390129 661217 2011 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Use of a fuzzy granulation–degranulation criterion for assessing cluster validity
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Use of a fuzzy granulation–degranulation criterion for assessing cluster validity
چکیده انگلیسی

The identification of a suitable clustering algorithm to partition data and assessment of the validity of the resultant partitioning are ongoing quests in unsupervised learning. In this study, a fuzzy granulation–degranulation criterion is proposed to evaluate the goodness of a fuzzy partitioning of the data. This, in turn, is used to determine the appropriate clustering algorithm suitable for a particular data set. In general, the quality of a partitioning is measured by computing the variance within it, which is a measure of compactness of the obtained partitioning. Here a new error function, which reflects how well the computed cluster centers represent the whole data set, is used as the goodness measure of the obtained partitioning. Thus a clustering algorithm, providing a good set of cluster centers which approximate well the whole data set, is considered to be the most suited. Thereafter this new fuzzy granulation–degranulation criterion is used to develop six new cluster validity indices. These indices mimic the definitions of the existing and well-known cluster validity indices, such as PBM-index, XB-index, PS-index, FS-index, K-index and SV-index, but use the new fuzzy granulation–degranulation based error function instead of cluster compactness. In order to evaluate the effectiveness of the proposed error function in correctly identifying the appropriate clustering algorithm for a particular data set, eight well-known clustering algorithms, K-means, Fuzzy C-means, GAK-means (genetic algorithm based K-means algorithm), a newly developed genetic point symmetry based clustering technique (GAPS-clustering), Average Linkage clustering algorithm, Expectation Maximization (EM) clustering algorithm, Self-Organizing Map (SOM) and Spectral clustering technique are evaluated on a set of six artificially generated and six real-life data sets. Results show that GAK-means is the most appropriate for most of the data sets used for the experiments. Thereafter the effectiveness of the proposed cluster validity indices in identifying the appropriate number of clusters automatically from different data sets are shown for above mentioned 12 data sets. For the purpose of comparison, results obtained with the original versions of the proposed cluster validity indices and results obtained by a density based clustering technique are also presented.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Fuzzy Sets and Systems - Volume 170, Issue 1, 1 May 2011, Pages 22-42