Article ID Journal Published Year Pages File Type
4950781 Information Processing Letters 2017 6 Pages PDF
Abstract
In the traditional version of the problem of approximate pattern matching, a pattern symbol is considered to match a text symbol if the two symbols are equal. Such a notion of exact equality is not suitable for situations where the text and pattern symbols are imprecise, e.g., obtained from an analog source, distorted by additive noise, etc. In such situations it is more appropriate to consider two alphabet symbols to match even if they are not equal, as long as they do not differ by more than a given threshold θ. The goal is then to compute the number of matches of the length-M pattern with all length-M substrings of the length-N text, i.e., to compute a vector of N−M+1 scores, where the ith score is the number of matches between the pattern and the substring that begins at text position i. The main result of this paper is to show that this threshold version of the problem can be solved by recursively solving 3+2log⁡θ instances of the traditional (i.e., zero-threshold) version of the problem, which is much-studied in the literature and for which there are many efficient (typically randomized) solutions of time complexity close to O(Nlog⁡M). This paper's result therefore implies the first randomized O(Nlog⁡M(log⁡θ+1)) solution for the threshold version of the problem. It also implies that any future improvement to the traditional (zero-threshold) version of the problem automatically translates into a similar improvement to the arbitrary-threshold case. Furthermore, we show that the factor Ω(log⁡θ) is tight if use our recursive framework.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,