کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515080 866949 2014 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Indexing and Self-indexing sequences of IEEE 754 double precision numbers
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Indexing and Self-indexing sequences of IEEE 754 double precision numbers
چکیده انگلیسی

Succinct data structures were designed to store and/or index data with a relatively small alphabet size, a rather skewed distribution and/or, a considerable amount of repetitiveness. Although many of them were developed to handle text, they have been used with other data types, like biological collections or source code. However, there are no applications of succinct data structures in the case of floating point data, the obvious reason is that this data type does not usually fulfill the aforementioned requirements.In this work, we present four solutions to store and index floating point data that take advantage of the latest developments in succinct data structures.The first one is based on the well-known inverted index. It consumes space around the size of the source data, providing appealing search times. The other three solutions are based on self-indexing structures. The first one uses a binary Huffman-shaped wavelet tree. It is never the winner in our experiments, but still yields a good balance between space and search performance. The second one is based on wavelet trees on bytecodes, and obtains the best space/time trade-off in most scenarios. The last one is based on Sadakane’s Compressed Suffix Array. It excels in space at the expense of less performance at searches.Including a representation of the original data, our indexes occupy from around 70% to 115% of the size of the original collection, and permit fast indexed searches within it.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 50, Issue 6, November 2014, Pages 857–875
نویسندگان
, , ,