Article ID Journal Published Year Pages File Type
6875891 Theoretical Computer Science 2017 23 Pages PDF
Abstract
The all-pairs suffix-prefix problem occurs as a subproblem of DNA sequence assembly where it is the most time-consuming part of the whole assembly. Although there are algorithms for the all-pairs suffix-prefix problem which are optimal in the asymptotic time complexity, they are slower than SOF and Readjoiner which are state-of-the-art algorithms used in practice. In this paper we present an algorithm for the all-pairs suffix-prefix problem that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Our algorithm is 14 times faster than SOF and 18 times faster than Readjoiner on average in real datasets and random datasets.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,