Article ID Journal Published Year Pages File Type
6864356 Neurocomputing 2018 29 Pages PDF
Abstract
The growing amount of available data has become a serious challenge to data mining and machine learning techniques. Well-known classification methods that have been widely applied so far are no longer feasible in Big Data environments. For this reason, prototype reduction techniques (both selection and generation) come up as a candidate solution to build a reduced version of the dataset that speeds up the execution of algorithms such as k-Nearest Neighbors and overcome their memory constraints. However, these solutions generally have a quadratic O(N2) time complexity and share similar limitations to those encountered in data mining and machine learning algorithms in terms of time and memory requirements. In order to overcome these limitations, we introduce a new distributed MapReduce prototype generation method called CHI-PG that provides a linear O(N) time complexity and ensures constant accuracy regardless of the degree of parallelism. This approach builds prototypes by applying a simple scheme based on the rule generation process of the Chi et al. Fuzzy Rule-Based Classification System and takes advantage of the suitability of this classifier for the MapReduce paradigm. The empirical study shows that our new approach significantly improves the execution time of a state-of-the-art distributed prototype reduction algorithm (MRPR) without decreasing (and even improving) classification accuracy and reduction rates. Moreover, CHI-PG has been shown to be a candidate solution to the time and memory constraints of k-Nearest Neighbors when tackling large-scale datasets.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,