کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
408299 679017 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
CVS: Fast cardinality estimation for large-scale data streams over sliding windows
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
CVS: Fast cardinality estimation for large-scale data streams over sliding windows
چکیده انگلیسی

Estimating the cardinality of data streams over a sliding window is an important problem in many applications, such as network traffic monitoring, web access log analysis and database. The problem becomes more difficult in large-scale data streams when time and space complexity is taken into account. In this paper, we present a novel randomized data structure to address the problem. The significant contributions are as follows: (1) A space-efficient counter vector sketch (CVS) are proposed, which extends the well-known bitmap sketch to sliding window settings. (2) Based on the CVS, a random update mechanism is introduced, whereby a small fixed number of entries are randomly chosen from CVS in a step and then updated. This means that the update procedure just costs constant time. (3) Furthermore, estimating cardinality by CVS just needs one-pass scan of the data. (4) Finally, a theoretical analysis is given to show the accuracy of CVS-based estimators. Our comprehensive experiments confirm that the CVS-based schema attains high accuracy, and demonstate its time efficiency in comparison with the Timestamp Vector (TSV) and the auxiliary indexing method.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 194, 19 June 2016, Pages 107–116
نویسندگان
, , , , ,