Generalizing the Wilcoxon rank-sum test for interval data

Article ID	Journal	Published Year	Pages	File Type
397316	International Journal of Approximate Reasoning	2015	14 Pages	PDF

Abstract

•We propose an adaption of Wilcoxon's two-sample rank-sum test to interval data.•Our method is also applicable to quantized data.•It leads to interval-valued p-values that are computed at a very low computational cost.•Interpretation of this test is straightforward.

Here we propose an adaption of Wilcoxon's two-sample rank-sum test to interval data. This adaption is interval-valued: it computes the minimum and maximum values of the statistic when we rank the set of all feasible samples (all joint samples compatible with the initial set-valued information). We prove that these bounds can be explicitly computed using a very low computational cost algorithm. Interpreting this generalized test is straightforward: if the obtained interval-valued p-value is on one side of the significance level, we will be able to make a decision (reject/no reject). Otherwise, we will conclude that our information is too vague to lead to a clear decision.Our method is also applicable to quantized data: in the presence of quantized information, the joint sample may contain a high proportion of draws, which can prevent the test from drawing a clear conclusion. According to the usual convention, when there are ties, the ranks for the observations in a tie are taken to be the average of the ranks for those observations. This convention can lead to wrong conclusions. Here, we consider the family of all possible rank permutations, such that a sample containing ties will not just be associated with a single value, but rather with a collection of values for the Wilcoxon's rank-sum statistic, with each one of them being associated with a different p-value. When the impact of quantization is too high to lead to a clear decision, our test provides an interval-valued p-value that includes the chosen significance level. It indicates that there is no clear conclusion according to this test.Two different experiments exemplify the properties of the generalized test: the first one illustrates its ability to avoid wrong decisions in the presence of quantized data. The second one shows the performance of the generalized test when used with interval data.