کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
534545 870265 2014 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Semi-supervised clustering of large data sets with kernel methods
ترجمه فارسی عنوان
خوشه بندی نیمه نظارت شده از مجموعه داده های بزرگ با روش های هسته ای
کلمات کلیدی
خوشه بندی نیمه نظارت، مجموعه داده های بزرگ، روش های هسته ای
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی

Labelling real world data sets is a difficult problem. Often, the human expert is unsure about a class label of a specific sample point or, in case of very large data sets, it is impractical to label them manually. In semi-supervised clustering, the sample labels, which are external informations, are used to find better matching cluster partitions. Further, kernel-based clustering methods are able to partition the data with nonlinear boundaries in feature space. While these methods improve the clustering results, they have a quadratic computation time. In this paper, we propose a meta-algorithm that processes small-sized subsets of a large data set, clusters them with the sample labels and merges the points close to the resulting prototypes with the next points, until the whole data set has been processed. It has a linear computation time. The error function that this meta-algorithm minimizes is presented. Although we applied this meta-algorithm to Kernel Fuzzy C-Means, Relational Neural Gas and Kernel K-Means, it can be applied to a broad range of kernel-based clustering methods. The proposed method has been empirically evaluated on two real world benchmark data sets.


► A meta-algorithm SKPC is proposed for semi-supervised clustering large data sets.
► The EM-style algorithm with sample weights converges to a local minimum.
► With external informations, i.e. the sample labels, the cluster results are improved.
► SKPC outperforms SKC and unsupervised kernel-based methods on two real-life data sets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 37, 1 February 2014, Pages 78–84
نویسندگان
, ,