Article ID Journal Published Year Pages File Type
6869739 Computational Statistics & Data Analysis 2014 18 Pages PDF
Abstract
The main question is how to compute the upper and lower limits of the range of possible values of a given statistic, when the data range over given intervals. Initially some well-known statistics, such as sample mean, sample variance or F-ratio, are considered in order to illustrate that in some cases the limits can be computed efficiently, while in some cases their computation is NP-hard. Subsequently, the t-ratio (variation coefficient) is considered. It is investigated when the limits for t-ratio are computable in polynomial time and a new efficient algorithm is designed for this case. Conversely, complementary NP-hardness results are proved, demonstrating the cases when the computation cannot be done efficiently. Subsequently, the NP-hardness results are strengthened: it is shown that under certain assumptions, even an approximate evaluation with an arbitrary absolute error is NP-hard. Finally, it is shown that the situation can also be (in some sense) regarded positively: a new pseudopolynomial algorithm is developed. The algorithm is of practical importance, especially when the dataset to be processed is large and does not contain “excessively” large numbers.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,