Article ID Journal Published Year Pages File Type
427909 Information Processing Letters 2010 6 Pages PDF
Abstract

A Bloom filter is a space-efficient data structure used for probabilistic set membership testing. When testing an object for set membership, a Bloom filter may give a false positive. The analysis of the false positive rate is a key to understanding the Bloom filter and applications that use it. We show experimentally that the classic analysis for false positive rate is wrong. We formally derive a correct formula using a balls-and-bins model and show how to numerically compute the new, correct formula in a stable manner. We also prove that the new formula always results in a predicted greater false positive rate than the classic formula. This correct formula is numerically compared to the classic formula for relative error – for a small Bloom filter the prediction of false positive rate will be in error when the classic formula is used.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics