A metric index for approximate string matching

Article ID	Journal	Published Year	Pages	File Type
437179	Theoretical Computer Science	2006	14 Pages	PDF

Abstract

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the occ occurrences of a pattern of length m, permitting up to r differences, in a text of length n over an alphabet of size σ, in average time O(m1+ε+occ) for any ε>0, if and . The index works well up to , where it achieves its maximum average search complexity . The construction time of the index is and its space is . This is the first index achieving average search time polynomial in m and independent of n, for . Previous methods achieve this complexity only for . We also present a simpler scheme needing O(n) space.