Article ID Journal Published Year Pages File Type
416606 Computational Statistics & Data Analysis 2007 9 Pages PDF
Abstract

The problem of selecting variables or features in a regression model in the presence of both additive (vertical) and leverage outliers is addressed. Since variable selection and the detection of anomalous data are not separable problems, the focus is on methods that select variables and outliers simultaneously. For selection, the fast forward selection algorithm, least angle regression (LARS), is used, but it is not robust. To achieve robustness to additive outliers, a dummy variable identity matrix is appended to the design matrix allowing both real variables and additive outliers to be in the selection set. For leverage outliers, these selection methods are used on samples of elemental sets in a manner similar to that used in high breakdown robust estimation. These results are compared to several other selection methods of varying computational complexity and robustness. The extension of these methods to situations where the number of variables exceeds the number of observations is discussed.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,