کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
514984 866931 2012 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An experimental study of constrained clustering effectiveness in presence of erroneous constraints
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
An experimental study of constrained clustering effectiveness in presence of erroneous constraints
چکیده انگلیسی

Recently a new fashion of semi-supervised clustering algorithms, coined as constrained clustering, has emerged. These new algorithms can incorporate some a priori domain knowledge to the clustering process, allowing the user to guide the method. The vast majority of studies about the effectiveness of these approaches have been performed using information, in the form of constraints, which was totally accurate. This would be the ideal case, but such a situation will be impossible in most realistic settings, due to errors in the constraint creation process, misjudgements of the user, inconsistent information, etc. Hence, the robustness of the constrained clustering algorithms when dealing with erroneous constraints is bound to play an important role in their final effectiveness.In this paper we study the behaviour of four constrained clustering algorithms (Constrained k-Means, Soft Constrained k-Means, Constrained Normalised Cut and Normalised Cut with Imposed Constraints) when not all the information supplied to them is accurate. The experimentation over text and numeric datasets using two different noise models, one of them an original approach based on similarities, highlighted the strengths and weaknesses of each method when working with positive and negative constraints, indicating the scenarios in which each algorithm is more appropriate.


► We test how four constrained clustering algorithms behave with inaccurate constraints.
► We use two different noise models, introducing a new similarity-based approach.
► The results suggest the most appropriate use for each algorithm in noisy environments.
► The possible noise in the constraints should be considered when tuning the algorithms.
► “Realistic” inaccurate constraints have less effect in the algorithms than expected.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 48, Issue 3, May 2012, Pages 537–551
نویسندگان
, , ,