کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1148288 1489761 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On masking and swamping robustness of leading nonparametric outlier identifiers for univariate data
ترجمه فارسی عنوان
در مورد پوشش دادن و استحکام کششی شناسه های غیرمستقیم پیشرو برای داده های یکسانی
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات ریاضیات کاربردی
چکیده انگلیسی


• Advances in NONPARAMETRIC outlier identification.
• Masking and swamping robustness of scaled deviation outlyingness is determined using quantitative criteria in the form of special masking and swamping breakdown points.
• Masking and swamping robustness of centered rank outlyingness is determined in the same way.
• The findings are applied to compare (median, MAD) versus (trimmed mean, trimmed standard deviation) in scaled deviation outlyingness.
• The findings are applied to explain how the boxplot acquires its strengths and to formulate a variant boxplot that offers a more appealing balance between masking robustness and swamping robustness.

In the wide-ranging scope of modern statistical data analysis, a key task is identification of outliers. For any outlier identification procedure, one needs to know its robustness against masking (an “outlier” is undetected as such) and swamping (a “nonoutlier” is classified as an “outlier”). Masking and swamping robustness are interrelated aspects which must be studied together. For such purposes, Serfling and Wang (2014) provide a general framework applicable in any data space. Implementation, however, with particular outlier identifiers in particular types of data space, requires additional theoretical development specialized to the chosen setting. Even the case of univariate data presents nontrivial challenges. Here we apply the framework to study the masking and swamping robustness properties of two leading types of nonparametric outlier identifiers, scaled deviation outlyingness and centered rank outlyingness. The results shed new light on the choice between (Median, MAD) and (trimmed mean, trimmed standard deviation) in using scaled deviation outlyingness. Also, our findings explain how the boxplot, a leading descriptive tool, performs using a hybrid outlyingness function incorporating a quantile-based component to describe the middle half of a data set and a scaled deviation outlyingness component for outlier detection. For both goals, the boxplot greatly favors swamping robustness over masking robustness. We also formulate a variant boxplot offering a more favorable trade-off between these two criteria.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Statistical Planning and Inference - Volume 162, July 2015, Pages 62–74
نویسندگان
, ,