دانلود رایگان مقاله: طبقه بندی نزدیکترین همسایه برای مقادیر چند لایهای در مقیاس بزرگ در مورد جرقه توزیع شده است

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6872902	1440626	2018	49 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Distributed nearest neighbor classification for large-scale multi-label data on spark

ترجمه فارسی عنوان

طبقه بندی نزدیکترین همسایه برای مقادیر چند لایهای در مقیاس بزرگ در مورد جرقه توزیع شده است

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Apache Spark - جرقه آپاچی Distributed computing - رایانش توزیع شده Multi-label classification - طبقه بندی چند لایک Nearest neighbors - نزدیکترین همسایگان MapReduce - نگاشت کاهش Big Data - کلان داده

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش مقاله

طبقه بندی نزدیکترین همسایه برای مقادیر چند لایهای در مقیاس بزرگ در مورد جرقه توزیع شده است

چکیده انگلیسی

Modern data is characterized by its ever-increasing volume and complexity, particularly when data instances belong to many categories simultaneously. This learning paradigm is known as multi-label classification and one of its most renowned methods is the multi-label k nearest neighbor ( Ml-knn). The traditional implementations of this method are not feasible for large-scale multi-label data due to its complexity and memory restrictions. We propose a distributed Ml-knn implementation based on the MapReduce programming model, implemented on Apache Spark. We compare three strategies for distributed nearest neighbor search: 1) iteratively broadcasting instances, 2) using a distributed tree-based index structure, and 3) building hash tables to group instances. The experimental study evaluates the trade-off between the quality of the predictions and runtimes on 22 benchmark datasets, and compares the scalability using different sizes of data. The results indicate that the tree-based index strategy outperforms the other approaches, having a speedup of up to 266x for the largest dataset, while achieving an accuracy equivalent to the exact methods. This strategy enables Ml-knn to scale efficiently with respect to the size of the problem.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 87, October 2018, Pages 66-82

نویسندگان

Jorge Gonzalez-Lopez, Sebastián Ventura, Alberto Cano,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : طبقه بندی نزدیکترین همسایه برای مقادیر چند لایهای در مقیاس بزرگ در مورد جرقه توزیع شده است

دسترسی سریع

ارتباط

English Website