کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425984 685976 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fingerprints for highly similar streams
ترجمه فارسی عنوان
اثر انگشت برای جریانهای بسیار مشابه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی

We propose an approach for approximating the Jaccard similarity of two streams, J(A,B)=|A∩B||A∪B|, for domains where this similarity is known to be high. Our method is based on a reduction from Jaccard similarity to F2F2 norm estimation, for which there exists a sketch that is efficient in terms of both size and compute time, which we augment by a sampling technique. Our approach offers an improvement in the fingerprint size that is quadratic in the degree of similarity between the streams. More precisely, to approximate the Jaccard similarity up to a multiplicative factor of ϵ with confidence δ  , it suffices to take a fingerprint of size O(ln⁡(1δ)(1−t)2ϵ2log⁡11−t) where t   is the known minimal Jaccard similarity between the streams. Further, computing our fingerprint can be done in time O(1)O(1) per element in the stream.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information and Computation - Volume 244, October 2015, Pages 113–121
نویسندگان
, ,