کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4960895 | 1446504 | 2017 | 8 صفحه PDF | دانلود رایگان |
Instance selection is used to reduce the size of training set by removing redundant, erroneous and noisy instances and is an important pre-processing step in KDD (knowledge discovery in databases). Recently, to process very large data set, several methods divide the training set into disjoint subsets and apply instance selection algorithms to each subset independently. In this paper, we analyze the limitation of these methods and give our viewpoint about how to “divide and conquer” in instance selection procedure. Furthermore, we propose an instance selection method based on random mutation hill climbing (RMHC) algorithm with MapReduce framework, called RMHC-MR. The experimental result shows that RMHC-MR has a good performance in terms of classification accuracy and reduction rate.
Journal: Procedia Computer Science - Volume 111, 2017, Pages 252-259