کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
437179 | 690086 | 2006 | 14 صفحه PDF | دانلود رایگان |

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the occ occurrences of a pattern of length m, permitting up to r differences, in a text of length n over an alphabet of size σ, in average time O(m1+ε+occ) for any ε>0, if and . The index works well up to , where it achieves its maximum average search complexity . The construction time of the index is and its space is . This is the first index achieving average search time polynomial in m and independent of n, for . Previous methods achieve this complexity only for . We also present a simpler scheme needing O(n) space.
Journal: Theoretical Computer Science - Volume 352, Issues 1–3, 7 March 2006, Pages 266-279