Article ID Journal Published Year Pages File Type
430715 Journal of Computer and System Sciences 2013 12 Pages PDF
Abstract

•We build spaced seeds for approximate pattern matching using Quadratic Residues.•Quadratic Residue seeds are lossless and work for any number of errors.•Quadratic Residue seeds provably generate less false positives than single seeds.

Spaced seeds are used in approximate pattern matching algorithms to quickly discard regions where a match is not likely to occur. We propose a family of lossless spaced seeds based on Quadratic Residues modulo a prime number. Our seeds work with a threshold t>1 in the sense that two regions are considered similar only if the seed hits t times within the regions. We prove that, for any number of errors, our seeds have an exponentially smaller probability of producing false positive matches than any traditional seed using a threshold t=1. To establish our result we introduce a formal notion of selectivity that generalizes the concept of seed weight, and we relate it to the minimum coverage and to a new structural property defined in terms on seed rotations. This groundwork will be useful for further analysis on seeds with threshold and we use it to provide improved bounds for approximate matching with 2 or 3 errors. Our results show that the use of a single seed with a threshold t>1 should be considered as a possible alternative to single or multiple seeds with t=1.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics