Article ID Journal Published Year Pages File Type
427155 Information Processing Letters 2013 5 Pages PDF
Abstract

•New algorithms for string matching with k  -mismatches in AC0AC0 and word-RAM models.•Method based on packed strings.•We achieve worst-case time sublinear in the text length for some parameters.•The proposed method is adapted for several other string matching problems.

Given strings P of length m and T of length n over an alphabet of size σ, the string matching with k-mismatches problem is to find the positions of all the substrings in T that are at Hamming distance at most k from P. If T   can be read only one character at the time the best known bounds are O(nklogk) and O(n+nk/wlogk) in the word-RAM model with word length w  . In the RAM models (including AC0AC0 and word-RAM) it is possible to read up to ⌊w/logσ⌋ characters in constant time if the characters of T   are encoded using ⌈logσ⌉ bits. The only solution for k  -mismatches in packed text works in O((nlogσ/logn)⌈mlog(k+logn/logσ)/w⌉+nε) time, for any ε>0ε>0. We present an algorithm that runs in time O(n⌊w/(mlogσ)⌋(1+logmin(k,σ)logm/logσ)) in the AC0AC0 model if m=O(w/logσ) and T   is given packed. We also describe a simpler variant that runs in time O(n⌊w/(mlogσ)⌋logmin(m,logw/logσ)) in the word-RAM model. The algorithms improve the existing bound for w=Ω(log1+ϵn), for any ϵ>0ϵ>0. Based on the introduced technique, we present algorithms for several other approximate matching problems.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,