کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
430895 688224 2013 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A novel approach for leveraging co-occurrence to improve the false positive error in signature files
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
A novel approach for leveraging co-occurrence to improve the false positive error in signature files
چکیده انگلیسی

Signature file is a well-studied method in information retrieval for indexing large text databases. Because of the small index size in this method, it is a good candidate for environments where memory is scarce. This small index size, however, comes at the cost of high false positive error rate. In this paper we address the problem of high false positive error rate of signature files by introducing COCA filters, a new variation of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique in real document collections we can reduce the false positive error by up to 21 times, for the same index size. It is also shown that in some extreme cases this technique is even able to completely eliminate the false positive error. COCA filters can be considered as a good replacement for Bloom filters wherever the co-occurrence of any two members of the universe is identifiable.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Discrete Algorithms - Volume 18, January 2013, Pages 63–74
نویسندگان
, , , ,