Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
430483 | Journal of Computer and System Sciences | 2009 | 8 Pages |
In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet Σ. We present an FFT-based algorithm that uses a novel prime-numbers encoding scheme, which is logn/logm times faster than the fastest extant approaches, which are based on boolean convolutions. In particular, if m|Σ|=nO(1), our algorithm runs in time O(nlogm), matching the complexity of the fastest techniques for wildcard matching, a special case of our problem. A major advantage of our algorithm is that it allows a tradeoff between the running time and the RAM word size. Our algorithm also speeds up solutions to approximate matching with character classes problems—namely, matching with k mismatches and Hamming distance, as well as to the subset matching problem.