دانلود رایگان مقاله: فضای کمتر: فهرست بندی برای پرس و جو با کلمات علامت دار

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
436303	689986	2014	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Less space: Indexing for queries with wildcards

ترجمه فارسی عنوان

فضای کمتر: فهرست بندی برای پرس و جو با کلمات علامت دار

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Wildcards Suffix trees - درخت های پیشنهادی Data structures - ساختارهای داده Range searching - محدوده جستجو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش مقاله

فضای کمتر: فهرست بندی برای پرس و جو با کلمات علامت دار

چکیده انگلیسی

Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T[1..n]T[1..n], such that whenever a pattern P[1..p]P[1..p] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem was given by Cole et al. [11], where the index space is O(nlogk⁡n)O(nlogk⁡n) words or O(nlogk+1⁡n)O(nlogk+1⁡n) bits and the query time is O(p+2hlog⁡log⁡n+occ)O(p+2hlog⁡log⁡n+occ), where k is the maximum number of wildcard characters allowed in P , h≤kh≤k is the number of wildcard characters in P and occ represents the number of occurrences of P in T . Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O(nlogk+ϵ⁡n)O(nlogk+ϵ⁡n) bits index achieving the same query time as the of Cole et al.'s index, where 0<ϵ<10<ϵ<1 is an arbitrary small constant. Then we propose another index of size O(nlogk⁡nlog⁡σ)O(nlogk⁡nlog⁡σ) bits, but with a slightly higher query time of O(p+2hlog⁡n+occ)O(p+2hlog⁡n+occ), where σ denotes the alphabet set size.We also study a related problem, where the task is to index a collection of documents (of n characters in total) so as to find the number of distinct documents containing a query pattern P. For the case where P contains at most a single wildcard character, we propose an O(nlog⁡n)O(nlog⁡n)-word index with optimal O(p)O(p) query time.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 557, 6 November 2014, Pages 120–127

نویسندگان

Moshe Lewenstein, J. Ian Munro, Venkatesh Raman, Sharma V. Thankachan,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : فضای کمتر: فهرست بندی برای پرس و جو با کلمات علامت دار

دسترسی سریع

ارتباط

English Website