کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
438013 | 690220 | 2008 | 10 صفحه PDF | دانلود رایگان |

Compact bases formed by motifs called “irredundant” and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet Σ, the problem of extracting such a base from s had been previously solved in time O(n2lognlog∣Σ∣) and O(∣Σ∣n2log2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n3). A much faster incremental algorithm is described here, which takes time O(n2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.
Journal: Theoretical Computer Science - Volume 408, Issues 2–3, 28 November 2008, Pages 106-115