Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6868650 | Computational Statistics & Data Analysis | 2018 | 14 Pages |
Abstract
Noisy and missing data are often encountered in real applications such that the observed covariates contain measurement errors. Despite the rapid progress of model selection with contaminated covariates in high dimensions, methodology that enjoys virtues in all aspects of prediction, variable selection, and computation remains largely unexplored. In this paper, we propose a new method called as the balanced estimation for high-dimensional error-in-variables regression to achieve an ideal balance between prediction and variable selection under both additive and multiplicative measurement errors. It combines the strengths of the nearest positive semi-definite projection and the combined L1
and concave regularization, and thus can be efficiently solved through the coordinate optimization algorithm. We also provide theoretical guarantees for the proposed methodology by establishing the oracle prediction and estimation error bounds equivalent to those for Lasso with the clean data set, as well as an explicit and asymptotically vanishing bound on the false sign rate that controls overfitting, a serious problem under measurement errors. Our numerical studies show that the amelioration of variable selection will in turn improve the prediction and estimation performance under measurement errors.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Zemin Zheng, Yang Li, Chongxiu Yu, Gaorong Li,