Article ID Journal Published Year Pages File Type
4960895 Procedia Computer Science 2017 8 Pages PDF
Abstract

Instance selection is used to reduce the size of training set by removing redundant, erroneous and noisy instances and is an important pre-processing step in KDD (knowledge discovery in databases). Recently, to process very large data set, several methods divide the training set into disjoint subsets and apply instance selection algorithms to each subset independently. In this paper, we analyze the limitation of these methods and give our viewpoint about how to “divide and conquer” in instance selection procedure. Furthermore, we propose an instance selection method based on random mutation hill climbing (RMHC) algorithm with MapReduce framework, called RMHC-MR. The experimental result shows that RMHC-MR has a good performance in terms of classification accuracy and reduction rate.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, , , , , ,