کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
437687 690174 2015 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Efficient sampling of non-strict turnstile data streams
ترجمه فارسی عنوان
نمونه برداری کارآمد از جریان داده های توریستی غیر دقیق
کلمات کلیدی
نمونه برداری، توزیع معکوس، جریان داده ها
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی

We study the problem of generating a large sample from a data stream SS of elements (i,v)(i,v), where i is a positive integer key, v is an integer equal to the count of key i  , and the sample consists of pairs (i,Ci)(i,Ci) for Ci=∑(i,v)∈SvCi=∑(i,v)∈Sv. We consider strict turnstile streams and general non-strict turnstile streams, in which CiCi may be negative. Our sample is useful for approximating both forward and inverse distribution statistics, within an additive error ϵ   and provable success probability 1−δ1−δ.Our sampling method improves by an order of magnitude the known processing time of each stream element, a crucial factor in data stream applications, thereby providing a feasible solution to the sampling problem. For example, for a sample of size O(ϵ−2log⁡(1/δ))O(ϵ−2log⁡(1/δ)) in non-strict streams, our solution requires O((log⁡log⁡(1/ϵ))2+(log⁡log⁡(1/δ))2)O((log⁡log⁡(1/ϵ))2+(log⁡log⁡(1/δ))2) operations per stream element, whereas the best previous solution requires O(ϵ−2log2⁡(1/δ))O(ϵ−2log2⁡(1/δ)) evaluations of a fully independent hash function per element.We achieve this improvement by constructing an efficient K-elements recovery structure from which K   elements can be extracted with probability 1−δ1−δ. Our structure enables our sampling algorithm to run on distributed systems and extract statistics on the difference between streams.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 590, 26 July 2015, Pages 106–117
نویسندگان
, , ,