Article ID Journal Published Year Pages File Type
4948139 Neurocomputing 2016 12 Pages PDF
Abstract
Variable selection plays a key role in explanatory modeling and its aim is to identify the variables that are truly important to the outcome. Recently, ensemble learning techniques have manifested great potential in improving the performance of some traditional methods such as lasso, genetic algorithm, stepwise search. Following the main principle to build a variable selection ensemble, we propose in this paper a novel approach by randomizing outputs (i.e., adding some random noise to the response) to maximize variable selection accuracy. In order to generate multiple but slightly different importance measures for each variable, some Gaussian noise is artificially added to the response. The new training set (i.e, the original design matrix together with the new response vector) is then fed into genetic algorithm to perform variable selection. By repeating this process a number of trials and fusing the results by simple averaging, a more reliable importance measure is obtained for each candidate variable. The variables are then ranked and further determined to be important or not by a thresholding rule. The performance of the proposed method is studied with some simulated and real-world data in the framework of linear and logistic regression models. The results demonstrate that it compares favorably with several other existing methods.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,