دانلود رایگان مقاله: جن: یک الگوریتم خوشه بندی سلسله مراتبی جدید، سریع و غیر قابل انکار

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
391666	661920	2016	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm

ترجمه فارسی عنوان

جن: یک الگوریتم خوشه بندی سلسله مراتبی جدید، سریع و غیر قابل انکار

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

خوشه بندی سلسله مراتبی، پیوند تک، اقدامات نابرابر، شاخص جینی

Hierarchical clustering - خوشه بندی سلسله مراتبی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

جن: یک الگوریتم خوشه بندی سلسله مراتبی جدید، سریع و غیر قابل انکار

چکیده انگلیسی

• A new hierarchical clustering linkage criterion, called Genie, is proposed.
• The criterion is based on the notion of an inequity measure, e.g., the Gini index.
• The new algorithm has been tested on 29 benchmark sets.
• The clustering quality outperforms the Ward and average linkage criteria.
• The algorithm is fast, parallelizable, and runs in O(n) memory.

The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure – unless the clusters are well-separated. To overcome its limitations, we propose a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini- or Bonferroni-index) of the cluster sizes does not increase drastically above a given threshold. The presented benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. The Genie algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering. It can be applied on arbitrary spaces equipped with a dissimilarity measure, e.g., on real vectors, DNA or protein sequences, images, rankings, informetric data, etc. A reference implementation of the algorithm has been included in the open source genie package for R.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 363, 1 October 2016, Pages 8–23

نویسندگان

Marek Gagolewski, Maciej Bartoszuk, Anna Cena,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : جن: یک الگوریتم خوشه بندی سلسله مراتبی جدید، سریع و غیر قابل انکار

دسترسی سریع

ارتباط

English Website