کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392171 664685 2015 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Constrained sequence analysis algorithms in computational biology
ترجمه فارسی عنوان
الگوریتم تجزیه و تحلیل توالی محدود در زیست شناسی محاسباتی
کلمات کلیدی
تجزیه و تحلیل توالی، الگوریتم، اتوماتای ​​محدود
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

The knowledge of the similarity of two or more sequences is crucial in computational molecular biology. The longest common subsequence (LCS) is a well-known and widely used measure for sequence similarity. Constrained variants of the LCS problem have been studied in the literature where the knowledge of the functionalities or structures of the input sequences are provided in the form of inclusion/exclusion constraint patterns. In this paper we focus on different variants of the LCS problem involving multiple input sequences and constraint patterns. Given L input sequences and ℓ constraint patterns, the goal here is to construct an LCS of the given sequences such that each of the constraint patterns occurs/does not occur in the LCS as a subsequence/substring.We devise finite automata based efficient algorithms for all the variants of the problem that run in O(|Σ|(R+L)+nL+|Σ|Rnℓ)O(|Σ|(R+L)+nL+|Σ|Rnℓ) time, where RR is the size of the resulting subsequence automaton, n   is the length of each input sequence and ΣΣ is the underlying alphabet. We also conduct an extensive experimental study to evaluate the practical performance of our algorithms. The experimental results suggest the superiority of our finite automata based algorithms. Therefore, we believe that our automata based algorithms will be useful in practical sequence analysis in computational biology and will replace the existing algorithms that are mostly based on memory intensive dynamic programming based methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 295, 20 February 2015, Pages 247–257
نویسندگان
, ,