Parallel attribute reduction algorithms using MapReduce

Article ID	Journal	Published Year	Pages	File Type
6857912	Information Sciences	2014	20 Pages	PDF

Abstract

Attribute reduction is the key technique for knowledge acquisition in rough set theory. However, it is still a challenging task to perform attribute reduction on massive data. During the process of attribute reduction on massive data, the key to improving the reduction efficiency is the effective computation of equivalence classes and attribute significance. Aiming at this problem, we propose several parallel attribute reduction algorithms in this paper. Specifically, we design a novel structure of ãkey,valueã pair to speed up the computation of equivalence classes and attribute significance and parallelize the traditional attribute reduction process based on MapReduce mechanism. The different parallelization strategies of attribute reduction are also compared and analyzed from the theoretic view. Abundant experimental results demonstrate the proposed parallel attribute reduction algorithms can perform efficiently and scale well on massive data.

Keywords

Data mining Rough set MapReduce