کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431340 688509 2011 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fast searching in packed strings
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Fast searching in packed strings
چکیده انگلیسی

Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P  . The classical Knuth–Morris–Pratt algorithm [SIAM J. Comput. 6 (2) (1977) 323–350] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let m⩽nm⩽n be the lengths P and Q, respectively, and let σ denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using timeO(nlogσn+m+occ). Here occ is the number of occurrences of P in Q  . For m=o(n)m=o(n) this improves the O(n)O(n) bound of the Knuth–Morris–Pratt algorithm. Furthermore, if m=O(n/logσn)m=O(n/logσn) our algorithm is optimal since any algorithm must spend at least Ω((n+m)logσlogn+occ)=Ω(nlogσn+occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth–Morris–Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Discrete Algorithms - Volume 9, Issue 1, March 2011, Pages 49–56
نویسندگان
,