کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431107 688275 2009 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Discovering subword associations in strings in time linear in the output size
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Discovering subword associations in strings in time linear in the output size
چکیده انگلیسی

Given a text string x of n symbols and an integer constant d  , we consider the problem of finding, for any pair (y,z)(y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x  . Although in principle there might be O(n4)O(n4) distinct subword pairs in x  , it is seen that it suffices to consider a family of only O(n2)O(n2) such pairs, with the property that for any neglected pair (y′,z′)(y′,z′) there exists a corresponding pair (y,z)(y,z) contained in our family such that: (i) y′y′ is a prefix of y   and z′z′ is a prefix of z  ; and (ii) the tandem index of (y′,z′)(y′,z′) equals that of (y,z)(y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Discrete Algorithms - Volume 7, Issue 2, June 2009, Pages 227–238
نویسندگان
, ,