Article ID Journal Published Year Pages File Type
7562209 Chemometrics and Intelligent Laboratory Systems 2018 14 Pages PDF
Abstract
Feature selection has been the problem of interest for many years. Almost all existing feature selection approaches use all training samples and features at once to select salient features. These approaches are named centralized methods; however, there are other approaches that split the training data on their dimensions in order to run each batch on different clusters (Machine) for the cases which we are dealing with ultra-big data. In this paper, a novel distributed feature selection approach based on hesitant fuzzy sets is proposed. First, datasets are horizontally (by their features) divided into some subsets according to the information energies of hesitant fuzzy sets and shuffling. Then, on each subset our HCPF (Hesitant fuzzy set based feature selection algorithm using Correlation coefficients for Partitioning Features) is applied individually. Finally, a merging procedure is employed that updates the final feature subset according to improvements in the classification accuracy. The effectiveness of the proposed method has been evaluated by twenty two state-of-the-art distributed and centralized algorithms on eight well-known microarray high dimensional datasets. The experimental results reveal that the proposed method has achieved significant results compared to the other approaches due to the statistical non-parametric Wilcoxon signed rank test. Our experiments confirm that the proposed method is effective to tackle feature selection problem in terms of classification accuracy and dimension reduction in ultra-high dimensional datasets.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, ,