Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4946326 | Knowledge-Based Systems | 2017 | 12 Pages |
Abstract
This paper presents a new algorithm for the subgroup discovery task called MEFASD-BD. The algorithm is developed in Apache Spark based on the MapReduce paradigm, and it is able to tackle high dimensional datasets in an efficient way. In fact, this algorithm is the first approximation to big data within evolutionary fuzzy systems for subgroup discovery. MEFASD-BD implements novel MapReduce functions which are able to analyse the quality of the subgroups obtained for each map with respect to the original dataset in order to improve the quality of these subgroups. In addition, the final reduce function of the algorithm employs the token competition operator in order to select the best rules extracted in the different maps. An experimental study with high dimensional datasets is performed in order to show the advantages of this algorithm in this type of problems. Specifically, the results of the study show an important reduction of the runtime while keeping the values in the standard quality measures for subgroup discovery.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
F. Pulgar-Rubio, A.J. Rivera-Rivas, M.D. Pérez-Godoy, P. González, C.J. Carmona, M.J. del Jesus,