کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6857311 661905 2016 26 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Voting-based instance selection from large data sets with MapReduce and random weight networks
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Voting-based instance selection from large data sets with MapReduce and random weight networks
چکیده انگلیسی
Instance selection is an important preprocessing step in machine learning. By choosing a subset of a data set, it achieves the same performance of a machine learning algorithm as if the whole data set is used, and it enables a machine learning algorithm to be feasible for and to work effectively with large data sets. Based on voting mechanism, this paper proposes a large data sets instance selection algorithm with MapReduce and random weight networks (RWNs). Firstly, the proposed algorithm employs the Map of MapReduce to partition the large data sets into some small subsets, and deploys them to different cloud computing nodes. Secondly, the informative instances are selected in parallel with an instance selection algorithm. Thirdly, the Reduce of MapReduce is used to collect the selected instances from different cloud computing nodes and a selected instance subset is obtained. The previous three processes are repeated p times (p is a parameter defined by the user), and p instance subsets are obtained. Finally, the voting method is used to select the most informative instances from the p subsets. The random weight network classifier is trained with the selected instance subset, and the testing accuracy is verified on the testing set. The proposed algorithm is experimentally compared with three state-of-the-art approaches which are CNN, ENN and RNN. The experimental results show that the proposed algorithm is effective and efficient.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 367–368, 1 November 2016, Pages 1066-1077
نویسندگان
, , ,