کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396543 670377 2013 24 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Practical perfect hashing in nearly optimal space
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Practical perfect hashing in nearly optimal space
چکیده انگلیسی

A hash function is a mapping from a key universe U   to a range of integers, i.e., h:U↦{0,1,…,m−1}h:U↦{0,1,…,m−1}, where m is the range's size. A perfect hash function   for some set S⊆US⊆U is a hash function that is one-to-one on S  , where m≥|S|m≥|S|. A minimal perfect hash function   for some set S⊆US⊆U is a perfect hash function with a range of minimum size, i.e., m=|S|m=|S|. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n   the space consumption ranges from 2.62n+o(n)2.62n+o(n) to 3.3n+o(n)3.3n+o(n) bits, and for m=1.23nm=1.23n it ranges from 1.95n+o(n)1.95n+o(n) to 2.7n+o(n)2.7n+o(n) bits. This is within a small constant factor from the theoretical lower bounds of 1.44n1.44n bits for m=n   and 0.89n0.89n bits for m=1.23nm=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46 min using a commodity PC. An open source implementation of the algorithms is available at http://cmph.sf.net under the GNU Lesser General Public License (LGPL).


► Minimal perfect hash functions are stored in nearly optimal space.
► For n   keys the space consumption ranges from 2.62n2.62n to 3.3n3.3n bits.
► Algorithm solves the membership problem for static key sets known in advance.
► An open source implementation is available at http://cmph.sf.net.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 38, Issue 1, March 2013, Pages 108–131
نویسندگان
, , ,