Article ID Journal Published Year Pages File Type
6869908 Computational Statistics & Data Analysis 2014 16 Pages PDF
Abstract
The important problems of variable selection and estimation in nonparametric additive regression models for high-dimensional data are addressed. Several methods have been proposed to model nonlinear relationships when the number of covariates exceeds the number of observations by using spline basis functions and group penalties. Nonlinear monotone effects on the response play a central role in many situations, in particular in medicine and biology. The monotone splines lasso (MS-lasso) is constructed to select variables and estimate effects using monotone splines (I-splines). The additive components in the model are represented by their I-spline basis function expansion and the component selection becomes that of selecting the groups of coefficients in the I-spline basis function expansion. A recent procedure, called cooperative lasso, is used to select sign-coherent groups, i.e. selecting the groups with either exclusively non-negative or non-positive coefficients. This leads to the selection of important covariates that have nonlinear monotone increasing or decreasing effect on the response. An adaptive version of the MS-lasso reduces both the bias and the number of false positive selections considerably. The MS-lasso and the adaptive MS-lasso are compared with other existing methods for variable selection in high dimensions by simulation and the methods are applied to two relevant genomic data sets. Results indicate that the (adaptive) MS-lasso has excellent properties compared to the other methods both in terms of estimation and selection, and can be recommended for high-dimensional monotone regression.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,