کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
391921 | 664567 | 2016 | 17 صفحه PDF | دانلود رایگان |
• We introduce the concept of predominant group based on the idea of Markov blanket to identify groups of correlated features.
• We propose a greedy strategy (GreedyPGG) that groups features based on the concept of predominant groups.
• We propose a VNS metaheuristic that uses the GreedyPGG strategy to reduce the dimensionality in high-dimensional data.
• Results show that VNS finds smaller subsets of features without degrading the predictive model.
In recent years, advances in technology have led to increasingly high-dimensional datasets. This increase of dimensionality along with the presence of irrelevant and redundant features make the feature selection process challenging with respect to efficiency and effectiveness. In this context, approximate algorithms are typically applied since they provide good solutions in a reasonable time. On the other hand, feature grouping has arisen as a powerful approach to reduce dimensionality in high-dimensional data. Recently, some authors have focused their attention on developing methods that combine feature grouping and feature selection to improve the model. In this paper, we propose a feature selection strategy that utilizes feature grouping to increase the effectiveness of the search. As feature selection strategy, we propose a Variable Neighborhood Search (VNS) metaheuristic. Then, we propose to group the input space into subsets of features by using the concept of Markov blankets. To the best of our knowledge, this is the first time in which the Markov blanket is used for grouping features. We test the performance of VNS by conducting experiments on several high-dimensional datasets from two different domains: microarray and text mining. We compare VNS with popular and competitive techniques. Results show that VNS is a competitive strategy capable of finding a small size of features with similar predictive power than that obtained with other algorithms used in this study.
Journal: Information Sciences - Volume 326, 1 January 2016, Pages 102–118