General foundations for studying masking and swamping robustness of outlier identifiers

Article ID	Journal	Published Year	Pages	File Type
1150839	Statistical Methodology	2014	12 Pages	PDF

Abstract

With greatly advanced computational resources, the scope of statistical data analysis and modeling has widened to accommodate pressing new arenas of application. In all such data settings, an important and challenging task is the identification of outliers. Especially, an outlier identification procedure must be robust against the possibilities of masking (an outlier is undetected as such) and swamping (a nonoutlier is classified as an outlier). Here we provide general foundations and criteria for quantifying the robustness of outlier detection procedures against masking and swamping. This unifies a scattering of existing results confined to univariate or multivariate data, and extends to a completely general framework allowing any type of data. For any space XX of objects and probability model FF on XX, we consider a real-valued outlyingness function O(x,F)O(x,F) defined over xx in XX and a sample version O(x,Xn)O(x,Xn) based on a sample XnXn from XX. In this setting, and within a coherent framework, we formulate general definitions of masking breakdown point and swamping breakdown point and develop lemmas for evaluating these robustness measures in practical applications. A brief illustration of the technique of application of the lemmas is provided for univariate scaled deviation outlyingness.

Keywords

Outlier detection Nonparametric