Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
431033 | Journal of Discrete Algorithms | 2011 | 7 Pages |
This paper revisits the problem of indexing a text S[1..n]S[1..n] for pattern matching with up to k errors. A naive solution either has a worst-case matching time complexity of Ω(mk)Ω(mk) or requires Ω(nk)Ω(nk) space, where m is the length of the pattern. Devising a solution with better performance has been a challenge until Cole et al. (2004) [5] showed an O(nlogkn)-space index that can support k -error matching in O(m+occ+logknloglogn) time, where occ is the number of occurrences. Motivated by the indexing of long sequences like DNA, we have investigated the feasibility of devising a linear-size index that still has a time complexity linear in pattern length. This paper in particular presents an O(n)O(n)-space index that supports k -error matching in O(m+occ+(logn)k(k+1)loglogn) worst-case time. This index can be further compressed from O(n)O(n) words into O(n)O(n) bits with a slight increase in the time complexity.