Article ID Journal Published Year Pages File Type
431107 Journal of Discrete Algorithms 2009 12 Pages PDF
Abstract

Given a text string x of n symbols and an integer constant d  , we consider the problem of finding, for any pair (y,z)(y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x  . Although in principle there might be O(n4)O(n4) distinct subword pairs in x  , it is seen that it suffices to consider a family of only O(n2)O(n2) such pairs, with the property that for any neglected pair (y′,z′)(y′,z′) there exists a corresponding pair (y,z)(y,z) contained in our family such that: (i) y′y′ is a prefix of y   and z′z′ is a prefix of z  ; and (ii) the tandem index of (y′,z′)(y′,z′) equals that of (y,z)(y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,