Article ID Journal Published Year Pages File Type
561059 Signal Processing 2017 7 Pages PDF
Abstract

•A fast method for high-dimensional regression and classification with missing data is proposed.•The proposed method combines matrix completion and adaptive lasso.•It provides promising empirical results.

Variable selection for high-dimensional data problems, including both regression and classification, has been a subject of intense research activities in recent years. Many promising solutions have been proposed. However, less attention has been given to the case when some of the data are missing. This paper proposes a general approach to high-dimensional variable selection with the presence of missing data when the missing fraction can be relatively large (e.g., 50%). Both regression and classification are considered. The proposed approach iterates between two major steps: the first step uses matrix completion to impute the missing data while the second step applies adaptive lasso to the imputed data to select the significant variables. Methods are provided for choosing all the involved tuning parameters. As fast algorithms and software are widely available for matrix completion and adaptive lasso, the proposed approach is fast and straightforward to implement. Results from numerical experiments and applications to two real data sets are presented to demonstrate the efficiency and effectiveness of the approach.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,