کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
419939 683877 2013 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Optimising sum-of-squares measures for clustering multisets defined over a metric space
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Optimising sum-of-squares measures for clustering multisets defined over a metric space
چکیده انگلیسی

Clustering is the problem of dividing a dataset into subsets, called clusters, which are both homogeneous and well-separated. Many criteria have been devised which simultaneously measure both of these properties. Two such criteria are centroid-distance, used by the popular kk-means algorithm, and the complete sum of all intra-cluster distances squared, which we call all-squares. This paper compares these two criteria in the context of clustering multisets which are defined over a metric space.We show that optimal clusterings according to both criteria can be consistent, meaning identical elements belong to the same cluster, but while centroid-distance always produces linearly separable solutions, all-squares does not.It has recently been shown that finding optimal clusterings according to centroid-distance in Euclidean space is NP-hard. We show that the decision problems associated with both optimisation problems are NP-complete in a simple, three-valued, metric space, and that the all-squares decision problem remains NP-complete in Euclidean space. We then show that if the metric is the simple 0/1 metric then both problems are in PP.We then introduce a new metric on clusterings based on the earth mover’s distance called the assignment metric and use this to show that optimal clusterings according to the two criteria can be as different as two clusterings can possibly be under both our metric and the well-known variation of information metric.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Discrete Applied Mathematics - Volume 161, Issues 16–17, November 2013, Pages 2499–2513
نویسندگان
, ,