کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523904 868525 2014 25 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Distributed text search using suffix arrays
ترجمه فارسی عنوان
جستجوی متن توزیع شده با استفاده از آرایه های پسوند
کلمات کلیدی
جستجوی متن توزیع شده آرایه های پیش فرض، موتورهای جستجوی متن توزیع شده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We introduce distributed-memory text-search algorithms based on suffix arrays.
• We analyze the proposed search algorithms to determine their theoretical speed-up.
• We do intensive experimentation on high-performance clusters of processors.
• We assess our practicality using also web search engines as a case study.

Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 40, Issue 9, October 2014, Pages 471–495
نویسندگان
, , , , ,