کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
426623 686124 2012 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Approximate all-pairs suffix/prefix overlaps
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Approximate all-pairs suffix/prefix overlaps
چکیده انگلیسی

Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate ϵ, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k=⌈ϵℓ⌉, where ℓ is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Kärkkäinen and Na, 2008). Our technique uses bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales up to millions of DNA reads.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information and Computation - Volume 213, April 2012, Pages 49-58