Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6874765 | Journal of Discrete Algorithms | 2016 | 25 Pages |
Abstract
Approximate pattern matching is an important computational problem that has a wide range of applications in computational biology and in information retrieval. However, searching a short pattern in a text with high error rates (10-20%) under the Levenshtein distance is a task for which few efficient solutions exist. Here we address this problem by introducing a new type of seeds: the 01â0 seeds. These seeds are made of two exact parts separated by parts with exactly one error. We show that those seeds are lossless, and we apply them to two filtration algorithms for two popular applications, one where a compressed index is built on the text and another one where the patterns are indexed. We also demonstrate experimentally the advantages of our approach compared to alternative methods implementing other types of seeds. This work opens the way to the design of more efficient and more sensitive text algorithms.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Christophe Vroland, Mikaël Salson, Sébastien Bini, Hélène Touzet,