Prior influence in linear regression when the number of covariates increases to infinity

Article ID	Journal	Published Year	Pages	File Type
1154882	Statistics & Probability Letters	2012	8 Pages	PDF

Abstract

It is becoming more typical in regression problems today to have the situation where “p>n”, that is, where the number of covariates is greater than the number of observations. Approaches to this problem include such strategies as model selection and dimension reduction, and, of course, a Bayesian approach. However, the discrepancy between p and n can be so large, especially in genomic data, that examining the limiting case where pââ can be a relevant calculation. Here we look at the effect of a prior distribution on the coefficients, and in particular characterize the conditions under which, as pââ, the prior does not overwhelm the data. Specifically, we find that the prior variance on the growing number of covariates must approach zero at rate 1/p, otherwise the prior will overwhelm the data and the posterior distribution of the regression coefficient will equal the prior distribution.

Keywords

Model selection Bayes Regression Linear models