Article ID Journal Published Year Pages File Type
10332597 Journal of Algorithms 2005 10 Pages PDF
Abstract
Estimating the maximum of a sampled dataset is an important and daunting task. We give a sampling algorithm for general datasets which gives estimates strictly better than the largest sample for an infinite family of datasets. Our algorithm overshoots the true maximum of the worst case dataset with probability at most (1/e)+O(1/k), where k is the size of our sample, which is much smaller than the size of the dataset. Our proof is the result of a new extremal graph coloring theorem: given any red/green coloring of the edges of a complete graph of n vertices, the probability that the edges among k randomly sampled vertices have a certain property is at most (1/e)+O(1/k). In addition, we show that if an algorithm gives an estimate strictly better than the largest sample for some dataset, then the algorithm overshoots the maximum on some other dataset with probability at least (1/e)−O(1/k).
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , ,