Article ID Journal Published Year Pages File Type
437068 Theoretical Computer Science 2006 15 Pages PDF
Abstract

The string matching problem, i.e. the task of finding all occurrences of one string as a substring of another one, is a fundamental problem in computer science. Recently, this problem received a great deal of attention due to numerous applications in computational biology. In this paper we address a modified version of Horspool's string matching algorithm using the probabilities of the different symbols to speed up the search. We show that the modified algorithm has a linear average running time; a precise asymptotical representation of the running time will be proven. A comparison of the average running time of the modified algorithm with well-known results for the original method shows that a substantial speed up for most of the symbol distributions has been achieved. Finally, we show that the distribution of the symbols can be approximated to a high precision using a random sample of sublinear size.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics