کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4951262 1441199 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
d-k-min-wise independent family of hash functions
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
d-k-min-wise independent family of hash functions
چکیده انگلیسی


- A framework to exponentially improve space and time of min-wise based algorithms.
- Gets around the Ω(log⁡1/ϵ) lower bound needed by min-wise hash functions.
- Only a constant degree of independence is required for the space and time improvements.
- Demonstrates how to apply the framework for similarity and rarity over data stream.

In this paper we introduce a general framework that exponentially improves the space, degree of independence, and time needed by min-wise-based algorithms. The authors, in SODA '11 [1], introduced an exponential time improvement for min-wise-based algorithms. Here we develop an alternative approach that achieves both exponential time and exponential space improvement. The new approach relaxes the need for approximately min-wise hash functions, hence getting around the Ω(log⁡1ϵ) independence lower bound, by defining and constructing a d-k-min-wise independent family of hash functions; surprisingly, for most cases, only 8-wise independence is needed for the additional improvement. Furthermore, we discuss how this construction can be used to improve many min-wise-based algorithms. To our knowledge such definitions, for hash functions, were never previously studied or constructed. Finally, we show how to apply it for similarity and rarity estimation over data streams; other min-wise-based algorithms can be adjusted in the same way.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Computer and System Sciences - Volume 84, March 2017, Pages 171-184
نویسندگان
, , ,