کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10321903 660776 2015 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Categorical data clustering: What similarity measure to recommend?
ترجمه فارسی عنوان
خوشه بندی داده های طبقه بندی: چه اندازه تشابه برای توصیه می شود؟
کلمات کلیدی
داده های طبقه بندی شده خوشه بندی معیار خوشه بندی، هدف خوشه بندی، شباهت،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Inside the clustering problem of categorical data resides the challenge of choosing the most adequate similarity measure. The existing literature presents several similarity measures, starting from the ones based on simple matching up to the most complex ones based on Entropy. The following issue, therefore, is raised: is there a similarity measure containing characteristics which offer more stability and also provides satisfactory results in databases involving categorical variables? To answer this, this work compared nine different similarity measures using the TaxMap clustering mechanism, and in order to evaluate the clustering, four quality measures were considered: NCC, Entropy, Compactness and Silhouette Index. Tests were performed in 15 different databases containing categorical data extracted from public repositories of distinct sizes and contexts. Analyzing the results from the tests, and by means of a pairwise ranking, it was observed that the coefficient of Gower, the simplest similarity measure presented in this work, obtained the best performance overall. It was considered the ideal measure since it provided satisfactory results for the databases considered.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 42, Issue 3, 15 February 2015, Pages 1247-1260
نویسندگان
, ,