MEFASD-BD: Multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments

Article ID	Journal	Published Year	Pages	File Type
4946326	Knowledge-Based Systems	2017	12 Pages	PDF

Abstract

This paper presents a new algorithm for the subgroup discovery task called MEFASD-BD. The algorithm is developed in Apache Spark based on the MapReduce paradigm, and it is able to tackle high dimensional datasets in an efficient way. In fact, this algorithm is the first approximation to big data within evolutionary fuzzy systems for subgroup discovery. MEFASD-BD implements novel MapReduce functions which are able to analyse the quality of the subgroups obtained for each map with respect to the original dataset in order to improve the quality of these subgroups. In addition, the final reduce function of the algorithm employs the token competition operator in order to select the best rules extracted in the different maps. An experimental study with high dimensional datasets is performed in order to show the advantages of this algorithm in this type of problems. Specifically, the results of the study show an important reduction of the runtime while keeping the values in the standard quality measures for subgroup discovery.

Keywords

Multi-Objective Evolutionary Fuzzy Systems Apache Spark MapReduce Subgroup discovery Big Data