کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
410538 679149 2009 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Towards understanding hierarchical clustering: A data distribution perspective
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Towards understanding hierarchical clustering: A data distribution perspective
چکیده انگلیسی

A very important category of clustering methods is hierarchical clustering. There are considerable research efforts which have been focused on algorithm-level improvements of the hierarchical clustering process. In this paper, our goal is to provide a systematic understanding of hierarchical clustering from a data distribution perspective. Specifically, we investigate the issues about how the “true” cluster distribution can make impact on the clustering performance, and what is the relationship between hierarchical clustering schemes and validation measures with respect to different data distributions. To this end, we provide an organized study to illustrate these issues. Indeed, one of our key findings reveals that hierarchical clustering tends to produce clusters with high variation on cluster sizes regardless of “true” cluster distributions. Also, our results show that F-measure, an external clustering validation measure, has bias towards hierarchical clustering algorithms which tend to increase the variation on cluster sizes. Viewed in light of this, we propose FnormFnorm, the normalized version of the F-measure, to solve the cluster validation problem for hierarchical clustering. Experimental results show that FnormFnorm is indeed more suitable than the unnormalized F-measure in evaluating the hierarchical clustering results across data sets with different data distributions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 72, Issues 10–12, June 2009, Pages 2319–2330
نویسندگان
, , ,