Article ID Journal Published Year Pages File Type
435391 Theoretical Computer Science 2009 14 Pages PDF
Abstract

We introduce a new notion of motifs, called masks, that succinctly represents the repeated patterns for an input sequence T of n symbols drawn from an alphabet Σ. We show how to build the set of all frequent maximal masks of length L in O(2Ln) time and space in the worst case, using the Karp–Miller–Rosenberg approach. We analytically show that our algorithm performs better than the method based on constant-time enumerating and checking all the potential (|Σ|+1)L candidate patterns in T, after a polynomial-time preprocessing of T. Our algorithm is also cache-friendly, attaining block transfers, where sort(n) is the cache complexity of sorting n items.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics