کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1548891 997762 2008 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Canonical Huffman code based full-text index
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی مواد مواد الکترونیکی، نوری و مغناطیسی
پیش نمایش صفحه اول مقاله
Canonical Huffman code based full-text index
چکیده انگلیسی

Full-text indices are data structures that can be used to find any substring of a given string. Many full-text indices require space larger than the original string. In this paper, we introduce the canonical Huffman code to the wavelet tree of a string T[1…n]. Compared with Huffman code based wavelet tree, the memory space used to represent the shape of wavelet tree is not needed. In case of large alphabet, this part of memory is not negligible. The operations of wavelet tree are also simpler and more efficient due to the canonical Huffman code. Based on the resulting structure, the multi-key rank and select functions can be performed using at most nH0 + ∣Σ∣(lg lg n + lg n − lg ∣Σ∣)+O(nH0) bits and in O(H0) time for average cases, where H0 is the zeroth order empirical entropy of T. In the end, we present an efficient construction algorithm for this index, which is on-line and linear.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Progress in Natural Science - Volume 18, Issue 3, 10 March 2008, Pages 325–330
نویسندگان
, , , ,