Distributed feature selection: An application to microarray data classification

Article ID	Journal	Published Year	Pages	File Type
495053	Applied Soft Computing	2015	15 Pages	PDF

Abstract

•Feature selection is indispensable when dealing with microarray data.•A new method for distributing the filtering process is proposed.•The data is distributed by features and then merged in a final subset.•The method is tested on 8 microarray datasets.•The classification accuracy is maintained and the time considerably shortened.

Feature selection is often required as a preliminary step for many pattern recognition problems. However, most of the existing algorithms only work in a centralized fashion, i.e. using the whole dataset at once. In this research a new method for distributing the feature selection process is proposed. It distributes the data by features, i.e. according to a vertical distribution, and then performs a merging procedure which updates the feature subset according to improvements in the classification accuracy. The effectiveness of our proposal is tested on microarray data, which has brought a difficult challenge for researchers due to the high number of gene expression contained and the small samples size. The results on eight microarray datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords

Distributed learning Feature selection Microarray data