String matching with inversions and translocations in linear average time (most of the time)

Article ID	Journal	Published Year	Pages	File Type
429054	Information Processing Letters	2011	5 Pages	PDF

Abstract

We present an efficient algorithm for finding all approximate occurrences of a given pattern p of length m in a text t of length n allowing for translocations of equal length adjacent factors and inversions of factors. The algorithm is based on an efficient filtering method and has an O(nmmax(α,β))O(nmmax(α,β))-time complexity in the worst case and O(max(α,β,σ))O(max(α,β,σ))-space complexity, where α and β are respectively the maximum length of the factors involved in any translocation and inversion, and σ is the alphabet size. Moreover we show that our algorithm has an O(n)O(n) average time complexity, whenever σ=Ω(logm/loglog1−εm), for ε>0ε>0. Experiments show that the proposed algorithm achieves very good results in practical cases.

► New algorithm for approximate matching with inversions and translocations. ► Application in DNA analysis, due to chromosomal rearrangements. ► Counting filter to detect pattern permutations, verified with another algorithm. ► Experimental results: about 4–25×4–25× faster than its predecessor. ► Linear time on average if only the alphabet is not very small.

Keywords

Algorithms Bioinformatics Approximate string matching