کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
532417 869947 2012 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
CPCQ: Contrast pattern based clustering quality index for categorical data
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
CPCQ: Contrast pattern based clustering quality index for categorical data
چکیده انگلیسی

Clustering validation is concerned with assessing the quality of clustering solutions. Since clustering is unsupervised and highly explorative, clustering validation has been an important and long standing research problem. Existing validity measures, including entropy-based and distance-based indices, have significant shortcomings. Indeed, for many datasets from the UCI repository, they fail to recognize that the expert-determined classes are the best clusters and they frequently give preference to clusterings with larger number of clusters. Their weakness reflects their inability to accurately capture intra-cluster coherence and inter-cluster separation. This paper proposes a novel Contrast Pattern based Clustering Quality index (CPCQ) for categorical data, by utilizing the quality and diversity of the contrast patterns, which contrast the clusters in given clusterings. High quality contrast patterns can serve to characterize the clusters and discriminate one cluster against the others. The CPCQ index is based on the rationale that a high-quality clustering should have many diversified high-quality contrast patterns among its clusters. The quality of individual contrast patterns is defined in terms of their length, support, and the length of their corresponding closed pattern. The quality measure concerning “many diversified” contrast patterns is defined in terms of the quality and diversity of some selected groups of contrast patterns with minimal overlap among contrast patterns and groups in terms of items and matching transactions. Experiments show that the CPCQ index (1) does not require a user to provide a distance function; (2) does not give inappropriate preference to larger number of clusters; (3) can recognize that expert-determined classes are the best clusters for many datasets from the UCI repository.


► We proposed a Contrast Pattern based Clustering Quality index for categorical data.
► A high-quality clustering should have many diversified high-quality contrast patterns.
► The CPCQ index does not give any preference to large or small numbers of clusters.
► The CPCQ index does not require the user to define a distance function.
► The CPCQ index is objective and scalable for clustering validation.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 45, Issue 4, April 2012, Pages 1739–1748
نویسندگان
, ,