کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
396543 | 670377 | 2013 | 24 صفحه PDF | دانلود رایگان |

A hash function is a mapping from a key universe U to a range of integers, i.e., h:U↦{0,1,…,m−1}h:U↦{0,1,…,m−1}, where m is the range's size. A perfect hash function for some set S⊆US⊆U is a hash function that is one-to-one on S , where m≥|S|m≥|S|. A minimal perfect hash function for some set S⊆US⊆U is a perfect hash function with a range of minimum size, i.e., m=|S|m=|S|. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n+o(n)2.62n+o(n) to 3.3n+o(n)3.3n+o(n) bits, and for m=1.23nm=1.23n it ranges from 1.95n+o(n)1.95n+o(n) to 2.7n+o(n)2.7n+o(n) bits. This is within a small constant factor from the theoretical lower bounds of 1.44n1.44n bits for m=n and 0.89n0.89n bits for m=1.23nm=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46 min using a commodity PC. An open source implementation of the algorithms is available at http://cmph.sf.net under the GNU Lesser General Public License (LGPL).
► Minimal perfect hash functions are stored in nearly optimal space.
► For n keys the space consumption ranges from 2.62n2.62n to 3.3n3.3n bits.
► Algorithm solves the membership problem for static key sets known in advance.
► An open source implementation is available at http://cmph.sf.net.
Journal: Information Systems - Volume 38, Issue 1, March 2013, Pages 108–131