Article ID Journal Published Year Pages File Type
430483 Journal of Computer and System Sciences 2009 8 Pages PDF
Abstract

In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet Σ. We present an FFT-based algorithm that uses a novel prime-numbers encoding scheme, which is logn/logm times faster than the fastest extant approaches, which are based on boolean convolutions. In particular, if m|Σ|=nO(1), our algorithm runs in time O(nlogm), matching the complexity of the fastest techniques for wildcard matching, a special case of our problem. A major advantage of our algorithm is that it allows a tradeoff between the running time and the RAM word size. Our algorithm also speeds up solutions to approximate matching with character classes problems—namely, matching with k mismatches and Hamming distance, as well as to the subset matching problem.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics