دانلود رایگان مقاله: افزایش نمونه گیری تصادفی متمایز برای شمارش پایه ای بر اتحاد جریان های توزیع شده

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
437610	690164	2015	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Boosting distinct random sampling for basic counting on the union of distributed streams

ترجمه فارسی عنوان

افزایش نمونه گیری تصادفی متمایز برای شمارش پایه ای بر اتحاد جریان های توزیع شده

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

شمارش پایه، جریان داده ها، جریانهای توزیع شده، نمونه گیری انطباق هماهنگ، نمونه گیری متمایز، نمونه گیری مستقیم

Data stream - جریان داده ها Direct sampling - نمونه گیری مستقیم

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش مقاله

افزایش نمونه گیری تصادفی متمایز برای شمارش پایه ای بر اتحاد جریان های توزیع شده

چکیده انگلیسی

We revisit the classic basic counting problem in the distributed streaming model. In the solution for maintaining an (ϵ,δ)(ϵ,δ)-estimate, we make the following new contributions: (1) For a bit stream of size n, where each bit has a probability at least γ to be 1, we exponentially reduced the average total processing time from the best prior work's Θ(nlog⁡(1/δ))Θ(nlog⁡(1/δ)) to O((1/(γϵ2))(log2⁡n)log⁡(1/δ))O((1/(γϵ2))(log2⁡n)log⁡(1/δ)), thus providing the first sublinear-time streaming algorithm for this problem. (2) In addition to an overall much faster processing speed, our method provides a new tradeoff that a lower accuracy demand (a larger value for ϵ ) promises a faster processing speed, whereas the best prior work's processing speed is Θ(nlog⁡(1/δ))Θ(nlog⁡(1/δ)) in any case and for any ϵ . (3) The worst-case total time cost of our method matches the best prior work's Θ(nlog⁡(1/δ))Θ(nlog⁡(1/δ)), which is necessary but rarely occurs in our method. (4) The space usage overhead in our method is a lower order term compared with the best prior work's space usage and occurs only O(log⁡n)O(log⁡n) times during the stream processing and is too negligible to be detected by the OS in practice. We further validate these theoretical results with experiments on both real-world and synthetic data, showing that our method is faster than the best prior work by a factor of several to several hundreds depending on the stream size and accuracy demands, without any detectable space usage overhead. Our method is based on a faster sampling technique that we design for boosting the sampling procedure in the best prior work and we believe this technique can be of other independent interest.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 602, 18 October 2015, Pages 60–79

نویسندگان

Bojian Xu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : افزایش نمونه گیری تصادفی متمایز برای شمارش پایه ای بر اتحاد جریان های توزیع شده

دسترسی سریع

ارتباط

English Website