kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4946321	1439284	2017	38 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Apache Hadoop k-Nearest Neighbors - K نزدیک ترین همسایگان Apache Spark - جرقه آپاچی MapReduce - نگاشت کاهش Big Data - کلان داده

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

چکیده انگلیسی

In this work we provide a new solution to perform an exact k-nearest neighbor classification based on Spark. We take advantage of its in-memory operations to classify big amounts of unseen cases against a big training dataset. The map phase computes the k-nearest neighbors in different training data splits. Afterwards, multiple reducers process the definitive neighbors from the list obtained in the map phase. The key point of this proposal lies on the management of the test set, keeping it in memory when possible. Otherwise, it is split into a minimum number of pieces, applying a MapReduce per chunk, using the caching skills of Spark to reuse the previously partitioned training set. In our experiments we study the differences between Hadoop and Spark implementations with datasets up to 11 million instances, showing the scaling-up capabilities of the proposed approach. As a result of this work an open-source Spark package is available.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 117, 1 February 2017, Pages 3-15

نویسندگان

Jesus Maillo, Sergio RamÃrez, Isaac Triguero, Francisco Herrera,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

دسترسی سریع

ارتباط

English Website