Distributed search based on self-indexed compressed text

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515635	867057	2012	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Wavelet trees Web search engines - موتورهای جستجوی وب Query processing - پردازش پرس و جو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Distributed search based on self-indexed compressed text

چکیده انگلیسی

Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.

Research highlights
► An indexing and query processing method for Web search engines is introduced.
► The method is built on self-indexed compressed text data structures and algorithms.
► The method enables both determination of top-K documents and surrounding query text.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 48, Issue 5, September 2012, Pages 819–827

نویسندگان

Diego Arroyuelo, Veronica Gil-Costa, Senén González, Mauricio Marin, Mauricio Oyarzún,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Distributed search based on self-indexed compressed text

دسترسی سریع

ارتباط

English Website