RMHC-MR: Instance selection by random mutation hill climbing algorithm with MapReduce in big data

Article ID	Journal	Published Year	Pages	File Type
4960895	Procedia Computer Science	2017	8 Pages	PDF

Abstract

Instance selection is used to reduce the size of training set by removing redundant, erroneous and noisy instances and is an important pre-processing step in KDD (knowledge discovery in databases). Recently, to process very large data set, several methods divide the training set into disjoint subsets and apply instance selection algorithms to each subset independently. In this paper, we analyze the limitation of these methods and give our viewpoint about how to “divide and conquer” in instance selection procedure. Furthermore, we propose an instance selection method based on random mutation hill climbing (RMHC) algorithm with MapReduce framework, called RMHC-MR. The experimental result shows that RMHC-MR has a good performance in terms of classification accuracy and reduction rate.

Keywords

Instance Selection Classification nearest neighbor MapReduce Big Data