کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
437610 | 690164 | 2015 | 20 صفحه PDF | دانلود رایگان |
We revisit the classic basic counting problem in the distributed streaming model. In the solution for maintaining an (ϵ,δ)(ϵ,δ)-estimate, we make the following new contributions: (1) For a bit stream of size n, where each bit has a probability at least γ to be 1, we exponentially reduced the average total processing time from the best prior work's Θ(nlog(1/δ))Θ(nlog(1/δ)) to O((1/(γϵ2))(log2n)log(1/δ))O((1/(γϵ2))(log2n)log(1/δ)), thus providing the first sublinear-time streaming algorithm for this problem. (2) In addition to an overall much faster processing speed, our method provides a new tradeoff that a lower accuracy demand (a larger value for ϵ ) promises a faster processing speed, whereas the best prior work's processing speed is Θ(nlog(1/δ))Θ(nlog(1/δ)) in any case and for any ϵ . (3) The worst-case total time cost of our method matches the best prior work's Θ(nlog(1/δ))Θ(nlog(1/δ)), which is necessary but rarely occurs in our method. (4) The space usage overhead in our method is a lower order term compared with the best prior work's space usage and occurs only O(logn)O(logn) times during the stream processing and is too negligible to be detected by the OS in practice. We further validate these theoretical results with experiments on both real-world and synthetic data, showing that our method is faster than the best prior work by a factor of several to several hundreds depending on the stream size and accuracy demands, without any detectable space usage overhead. Our method is based on a faster sampling technique that we design for boosting the sampling procedure in the best prior work and we believe this technique can be of other independent interest.
Journal: Theoretical Computer Science - Volume 602, 18 October 2015, Pages 60–79