کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
431107 | 688275 | 2009 | 12 صفحه PDF | دانلود رایگان |

Given a text string x of n symbols and an integer constant d , we consider the problem of finding, for any pair (y,z)(y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x . Although in principle there might be O(n4)O(n4) distinct subword pairs in x , it is seen that it suffices to consider a family of only O(n2)O(n2) such pairs, with the property that for any neglected pair (y′,z′)(y′,z′) there exists a corresponding pair (y,z)(y,z) contained in our family such that: (i) y′y′ is a prefix of y and z′z′ is a prefix of z ; and (ii) the tandem index of (y′,z′)(y′,z′) equals that of (y,z)(y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.
Journal: Journal of Discrete Algorithms - Volume 7, Issue 2, June 2009, Pages 227–238