Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
431107 | Journal of Discrete Algorithms | 2009 | 12 Pages |
Given a text string x of n symbols and an integer constant d , we consider the problem of finding, for any pair (y,z)(y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x . Although in principle there might be O(n4)O(n4) distinct subword pairs in x , it is seen that it suffices to consider a family of only O(n2)O(n2) such pairs, with the property that for any neglected pair (y′,z′)(y′,z′) there exists a corresponding pair (y,z)(y,z) contained in our family such that: (i) y′y′ is a prefix of y and z′z′ is a prefix of z ; and (ii) the tandem index of (y′,z′)(y′,z′) equals that of (y,z)(y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.