Processing top-k queries from samples

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
452557	694548	2008	18 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Estimation - برآورد کردن Heavy hitters - ضربه گیرهای سنگین confidence intervals - فاصله اطمینان Sampling - نمونه‌برداری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

چکیده انگلیسی

Top-k queries are desired aggregation operations on datasets. Examples of queries on network data include finding the top 100 source Autonomous Systems (AS), top 100 ports, or top domain names over IP packets or over IP flow records. Since the complete dataset is often not available or not feasible to examine, we are interested in processing top-k queries from samples.If all records can be processed, the top-k items can be obtained by counting the frequency of each item. Even when the full dataset is observed, however, resources are often insufficient for such counting so techniques were developed to overcome this issue. When we can observe only a random sample of the records, an orthogonal complication arises: The top frequencies in the sample are biased estimates of the actual top-k frequencies. This bias depends on the distribution and must be accounted for when seeking the actual value.We address this by designing and evaluating several schemes that derive rigorous confidence bounds for top-k estimates. Simulations on various datasets that include IP flows data, show that schemes exploiting more of the structure of the sample distribution produce much tighter confidence intervals with an order of magnitude fewer samples than simpler schemes that utilize only the sampled top-k frequencies. The simpler schemes, however, are more efficient in terms of computation.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Networks - Volume 52, Issue 14, 9 October 2008, Pages 2605–2622

نویسندگان

Edith Cohen, Nadav Grossaug, Haim Kaplan,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Processing top-k queries from samples

دسترسی سریع

ارتباط

English Website