Article ID Journal Published Year Pages File Type
431033 Journal of Discrete Algorithms 2011 7 Pages PDF
Abstract

This paper revisits the problem of indexing a text S[1..n]S[1..n] for pattern matching with up to k   errors. A naive solution either has a worst-case matching time complexity of Ω(mk)Ω(mk) or requires Ω(nk)Ω(nk) space, where m is the length of the pattern. Devising a solution with better performance has been a challenge until Cole et al. (2004) [5] showed an O(nlogkn)-space index that can support k  -error matching in O(m+occ+logknloglogn) time, where occ   is the number of occurrences. Motivated by the indexing of long sequences like DNA, we have investigated the feasibility of devising a linear-size index that still has a time complexity linear in pattern length. This paper in particular presents an O(n)O(n)-space index that supports k  -error matching in O(m+occ+(logn)k(k+1)loglogn) worst-case time. This index can be further compressed from O(n)O(n) words into O(n)O(n) bits with a slight increase in the time complexity.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , , ,