کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6855214 1437610 2018 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Parallel mining of time-faded heavy hitters
ترجمه فارسی عنوان
استخراج موازی از مهاجمان سنگین محسوب می شود
کلمات کلیدی
پیغام عبور گیرنده های سنگین مدل زمان محو شدن طرح ها،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
In this paper we present PFDCMSS (Parallel Forward Decay Count-Min Space Saving) which, to the best of our knowledge, is the world first message-passing parallel algorithm for mining time-faded heavy hitters. The algorithm is a parallel version of the recently published FDCMSS (Forward Decay Count-Min Space Saving) sequential algorithm. We formally prove its correctness by showing that the underlying data structure, a sketch augmented with a Space Saving stream summary holding exactly two counters, is mergeable. Whilst mergeability of traditional sketches derives immediately from theory, we show that, instead, merging our augmented sketch is non trivial. Nonetheless, the resulting parallel algorithm is fast and simple to implement. The very large volumes of modern datasets in the context of Big Data present new challenges that current sequential algorithms can not cope with; on the contrary, parallel computing enables near real time processing of very large datasets, which are growing at an unprecedented scale. Our algorithm's implementation, taking advantage of the MPI (Message Passing Interface) library, is portable, reliable and provides cutting-edge performance. Extensive experimental results confirm that PFDCMSS retains the extreme accuracy and error bound provided by FDCMSS whilst providing excellent parallel scalability. Our contributions are three-fold: (i) we prove the non trivial mergeability of the augmented sketch used in the FDCMSS algorithm; (ii) we derive PFDCMSS, a novel message-passing parallel algorithm; (iii) we experimentally prove that PFDCMSS is extremely accurate and scalable, allowing near real time processing of large datasets. The result supports both casual users and seasoned, professional scientists working on expert and intelligent systems.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 96, 15 April 2018, Pages 115-128
نویسندگان
, , ,