کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
533282 870092 2014 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A size-insensitive integrity-based fuzzy c-means method for data clustering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
A size-insensitive integrity-based fuzzy c-means method for data clustering
چکیده انگلیسی


• We propose a new conditional FCM method based on “integrity” and size-ratio of clusters.
• The proposed method can significantly alleviate the “cluster-size sensitivity” problem.
• The proposed method has much bigger tolerance for the “distance” between clusters.
• The proposed method has more flexibility of selecting the initial cluster centers to keep the clustering method work successfully.
• Our method has much higher clustering accuracy than FCM and csiFCM for clustering datasets containing unbalanced clusters.

Fuzzy c-means (FCM) is one of the most popular techniques for data clustering. Since FCM tends to balance the number of data points in each cluster, centers of smaller clusters are forced to drift to larger adjacent clusters. For datasets with unbalanced clusters, the partition results of FCM are usually unsatisfactory. Cluster size insensitive FCM (csiFCM) dealt with “cluster-size sensitivity” problem by dynamically adjusting the condition value for the membership of each data point based on cluster size after the defuzzification step in each iterative cycle. However, the performance of csiFCM is sensitive to both the initial positions of cluster centers and the “distance” between adjacent clusters. In this paper, we present a cluster size insensitive integrity-based FCM method called siibFCM to improve the deficiency of csiFCM. The siibFCM method can determine the membership contribution of every data point to each individual cluster by considering cluster's integrity, which is a combination of compactness and purity. “Compactness” represents the distribution of data points within a cluster while “purity” represents how far a cluster is away from its adjacent cluster. We tested our siibFCM method and compared with the traditional FCM and csiFCM methods extensively by using artificially generated datasets with different shapes and data distributions, synthetic images, real images, and Escherichia coli dataset. Experimental results showed that the performance of siibFCM is superior to both traditional FCM and csiFCM in terms of the tolerance for “distance” between adjacent clusters and the flexibility of selecting initial cluster centers when dealing with datasets with unbalanced clusters.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 47, Issue 5, May 2014, Pages 2042–2056
نویسندگان
, , , ,