کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
383337 | 660816 | 2013 | 12 صفحه PDF | دانلود رایگان |

• Low quality data (LQD) appear in real problems. We incorporate the processing of LQD in feature selection.
• We propose a feature selection method that works within the framework of the fuzzy logic theory.
• Our approach is classified as a hybrid method that combines the filter and wrapper methods.
• The experiments evaluate the performance of the approach with low quality and microarray datasets.
• The results indicate that the approach has a good classification performance with or without LQDs.
Today, feature selection is an active research in machine learning. The main idea of feature selection is to choose a subset of available features, by eliminating features with little or no predictive information, as well as redundant features that are strongly correlated. There are a lot of approaches for feature selection, but most of them can only work with crisp data. Until now there have not been many different approaches which can directly work with both crisp and low quality (imprecise and uncertain) data. That is why, we propose a new method of feature selection which can handle both crisp and low quality data. The proposed approach is based on a Fuzzy Random Forest and it integrates filter and wrapper methods into a sequential search procedure with improved classification accuracy of the features selected. This approach consists of the following main steps: (1) scaling and discretization process of the feature set; and feature pre-selection using the discretization process (filter); (2) ranking process of the feature pre-selection using the Fuzzy Decision Trees of a Fuzzy Random Forest ensemble; and (3) wrapper feature selection using a Fuzzy Random Forest ensemble based on cross-validation. The efficiency and effectiveness of this approach is proved through several experiments using both high dimensional and low quality datasets. The approach shows a good performance (not only classification accuracy, but also with respect to the number of features selected) and good behavior both with high dimensional datasets (microarray datasets) and with low quality datasets.
Journal: Expert Systems with Applications - Volume 40, Issue 16, 15 November 2013, Pages 6241–6252