کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431723 688618 2007 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Succinct data structures for flexible text retrieval systems
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Succinct data structures for flexible text retrieval systems
چکیده انگلیسی

We propose succinct data structures for text retrieval systems supporting document listing queries and ranking queries based on the tf*idftf*idf (term frequency times inverse document frequency) scores of documents. Traditional data structures for these problems support queries only for some predetermined keywords. Recently Muthukrishnan proposed a data structure for document listing queries for arbitrary patterns at the cost of data structure size. For computing the tf*idftf*idf scores there has been no efficient data structures for arbitrary patterns.Our new data structures support these queries using small space. The space is only 2/ϵ2/ϵ times the size of compressed documents plus 10n bits for a document collection of length n  , for any 0<ϵ⩽10<ϵ⩽1. This is much smaller than the previous O(nlogn)O(nlogn) bit data structures. Query time is O(m+qlogϵn)O(m+qlogϵn) for listing and computing tf*idftf*idf scores for all q documents containing a given pattern of length m. Our data structures are flexible in a sense that they support queries for arbitrary patterns.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Discrete Algorithms - Volume 5, Issue 1, March 2007, Pages 12–22
نویسندگان
,