Document clustering using synthetic cluster prototypes

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
379013	659251	2011	23 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Term selection Subspace clustering - خوشه بندی فضای مجازی Document clustering - خوشه بندی مستند Clustering methods - روش خوشه بندی Text mining - متن‌کاوی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Document clustering using synthetic cluster prototypes

چکیده انگلیسی

The use of centroids as prototypes for clustering text documents with the k-means family of methods is not always the best choice for representing text clusters due to the high dimensionality, sparsity, and low quality of text data. Especially for the cases where we seek clusters with small number of objects, the use of centroids may lead to poor solutions near the bad initial conditions. To overcome this problem, we propose the idea of synthetic cluster prototype that is computed by first selecting a subset of cluster objects (instances), then computing the representative of these objects and finally selecting important features. In this spirit, we introduce the MedoidKNN synthetic prototype that favors the representation of the dominant class in a cluster. These synthetic cluster prototypes are incorporated into the generic spherical k-means procedure leading to a robust clustering method called k-synthetic prototypes (k-sp). Comparative experimental evaluation demonstrates the robustness of the approach especially for small datasets and clusters overlapping in many dimensions and its superior performance against traditional and subspace clustering methods.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 70, Issue 3, March 2011, Pages 284–306

نویسندگان

Argyris Kalogeratos, Aristidis Likas,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Document clustering using synthetic cluster prototypes

دسترسی سریع

ارتباط

English Website