کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
436477 690008 2014 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Average-case linear-time similar substring searching by the q-gram distance
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Average-case linear-time similar substring searching by the q-gram distance
چکیده انگلیسی

In this paper we consider the problem of similar substring searching in the q-gram distance. The q  -gram distance dq(x,y)dq(x,y) is a similarity measure between two strings x and y defined by the number of different q  -grams between them. The distance can be used instead of the edit distance due to its lower computation cost, O(|x|+|y|)O(|x|+|y|) vs. O(|x||y|)O(|x||y|), and its good approximation for the edit distance. However, if this distance is applied to the problem of finding all similar strings, in a long text t, to a given pattern p, the total computation cost is sometimes not acceptable. Ukkonen already proposed two fast algorithms: one with an array and the other with a tree. When “similar” means k   or less in dqdq, their time complexities are O(|t|k+|p|)O(|t|k+|p|) and O(|t|logk+|p|), respectively. In this paper, we propose two algorithms of average-case complexity O(|t|+|p|)O(|t|+|p|), although their worst-case complexities are still O(|t|k+|p|)O(|t|k+|p|) and O(|t|logk+|p|), respectively. The linearity of the average-case complexity is analyzed under the assumption of random sampling of t and the condition that q is larger than a threshold. The algorithms exploit the fact that similar substrings in t   are often found at very close positions if the beginning positions of the substrings are close. In the second proposed algorithm, we adopted a doubly-linked list supported by an array and a search tree to search for a list element in O(logk) time. Experimental results support their theoretical average-case complexities.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 530, 17 April 2014, Pages 23–41
نویسندگان
, , ,