کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
416195 | 681296 | 2007 | 13 صفحه PDF | دانلود رایگان |
We propose a Bayesian model for clustered outliers in multiple regression. In the literature, outliers are frequently modeled as coming from a subgroup where the variance of the errors is much larger than in the rest of the data. By contrast, when a cluster of outliers exists, we show that it can be more informative to model them as coming from a subgroup where different regression coefficients hold. We can explicitly model the clustering phenomenon by assuming that the probability of an outlier is a function of the explanatory variables. Fitting proceeds via the Gibbs sampler, using the Metropolis–Hastings algorithm to produce variates from the more unusual distributions. Initialization uses a least median of squares fit, and in some ways this method can be viewed as a Bayesian version of the many algorithms that use this fit as a start to some more efficient estimator. This method works very well in a variety of test data sets. We illustrate its use in a data set of sailboat prices, where it yields information both on the identity of the outliers and on their location, spread, and the regression coefficients inside the minority subgroup.
Journal: Computational Statistics & Data Analysis - Volume 51, Issue 8, 1 May 2007, Pages 3955–3967