The complexity of computation and approximation of the t-ratio over one-dimensional interval data

Article ID	Journal	Published Year	Pages	File Type
6869739	Computational Statistics & Data Analysis	2014	18 Pages	PDF

Abstract

The main question is how to compute the upper and lower limits of the range of possible values of a given statistic, when the data range over given intervals. Initially some well-known statistics, such as sample mean, sample variance or F-ratio, are considered in order to illustrate that in some cases the limits can be computed efficiently, while in some cases their computation is NP-hard. Subsequently, the t-ratio (variation coefficient) is considered. It is investigated when the limits for t-ratio are computable in polynomial time and a new efficient algorithm is designed for this case. Conversely, complementary NP-hardness results are proved, demonstrating the cases when the computation cannot be done efficiently. Subsequently, the NP-hardness results are strengthened: it is shown that under certain assumptions, even an approximate evaluation with an arbitrary absolute error is NP-hard. Finally, it is shown that the situation can also be (in some sense) regarded positively: aÂ new pseudopolynomial algorithm is developed. The algorithm is of practical importance, especially when the dataset to be processed is large and does not contain “excessively” large numbers.

Keywords

Interval data NP-hardness Inapproximability Computational complexity