Article ID Journal Published Year Pages File Type
1150839 Statistical Methodology 2014 12 Pages PDF
Abstract

With greatly advanced computational resources, the scope of statistical data analysis and modeling has widened to accommodate pressing new arenas of application. In all such data settings, an important and challenging task is the identification of outliers. Especially, an outlier identification procedure must be robust against the possibilities of masking (an outlier is undetected as such) and swamping (a nonoutlier is classified as an outlier). Here we provide general foundations and criteria for quantifying the robustness of outlier detection procedures against masking and swamping. This unifies a scattering of existing results confined to univariate or multivariate data, and extends to a completely general framework allowing any type of data. For any space XX of objects and probability model FF on XX, we consider a real-valued outlyingness function O(x,F)O(x,F) defined over xx in XX and a sample version O(x,Xn)O(x,Xn) based on a sample XnXn from XX. In this setting, and within a coherent framework, we formulate general definitions of masking breakdown point and swamping breakdown point and develop lemmas for evaluating these robustness measures in practical applications. A brief illustration of the technique of application of the lemmas is provided for univariate scaled deviation outlyingness.

Related Topics
Physical Sciences and Engineering Mathematics Statistics and Probability
Authors
, ,