Article ID Journal Published Year Pages File Type
1148288 Journal of Statistical Planning and Inference 2015 13 Pages PDF
Abstract

•Advances in NONPARAMETRIC outlier identification.•Masking and swamping robustness of scaled deviation outlyingness is determined using quantitative criteria in the form of special masking and swamping breakdown points.•Masking and swamping robustness of centered rank outlyingness is determined in the same way.•The findings are applied to compare (median, MAD) versus (trimmed mean, trimmed standard deviation) in scaled deviation outlyingness.•The findings are applied to explain how the boxplot acquires its strengths and to formulate a variant boxplot that offers a more appealing balance between masking robustness and swamping robustness.

In the wide-ranging scope of modern statistical data analysis, a key task is identification of outliers. For any outlier identification procedure, one needs to know its robustness against masking (an “outlier” is undetected as such) and swamping (a “nonoutlier” is classified as an “outlier”). Masking and swamping robustness are interrelated aspects which must be studied together. For such purposes, Serfling and Wang (2014) provide a general framework applicable in any data space. Implementation, however, with particular outlier identifiers in particular types of data space, requires additional theoretical development specialized to the chosen setting. Even the case of univariate data presents nontrivial challenges. Here we apply the framework to study the masking and swamping robustness properties of two leading types of nonparametric outlier identifiers, scaled deviation outlyingness and centered rank outlyingness. The results shed new light on the choice between (Median, MAD) and (trimmed mean, trimmed standard deviation) in using scaled deviation outlyingness. Also, our findings explain how the boxplot, a leading descriptive tool, performs using a hybrid outlyingness function incorporating a quantile-based component to describe the middle half of a data set and a scaled deviation outlyingness component for outlier detection. For both goals, the boxplot greatly favors swamping robustness over masking robustness. We also formulate a variant boxplot offering a more favorable trade-off between these two criteria.

Related Topics
Physical Sciences and Engineering Mathematics Applied Mathematics
Authors
, ,