Improving k-means through distributed scalable metaheuristics

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4947435	1439581	2017	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Distributed Algorithms - الگوریتم های توزیع شده Evolutionary algorithms - الگوریتم های تکاملی Metaheuristics - الگوریتم‌های فراابتکاری یا فراتکاملی یا فرااکتشافی Optimization - بهينه سازي Scalability - مقیاس پذیری k-Means - میانگین ـ کی MapReduce - نگاشت کاهش

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Improving k-means through distributed scalable metaheuristics

چکیده انگلیسی

The recent growing size of datasets requires scalability of data mining algorithms, such as clustering algorithms. The MapReduce programing model provides the scalability needed, alongside with portability as well as automatic data safety and management. k-means is one of the most popular algorithms in data mining and can be easily adapted to the MapReduce model. Nevertheless, k-means has drawbacks, such as the need to provide the number of clusters (k) in advance and the sensitivity of the algorithm to the initial cluster prototypes. This paper presents two evolutionary scalable metaheuristics in MapReduce that automatically seek the solution with the optimal number of clusters and best clustering structure for scalable datasets. The first consists in an algorithm able to iteratively enhance k-means clusterings through evolutionary operators designed to handle distributed data. The second consists in applying evolutionary k-means to cluster each distributed portion of a dataset in an independent way, combining the obtained results into an ensemble afterwards. The proposed techniques are compared asymptotically and experimentally with other state-of-the-art clustering algorithms also developed in MapReduce. The results are analyzed by statistical tests and show that the first proposed metaheuristic yielded results with the best quality, while the second achieved the best computing times.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 246, 12 July 2017, Pages 45-57

نویسندگان

G.V. Oliveira, F.P. Coutinho, R.J.G.B. Campello, M.C. Naldi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Improving k-means through distributed scalable metaheuristics

دسترسی سریع

ارتباط

English Website