کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6370532 1623852 2014 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Extraction of high quality k-words for alignment-free sequence comparison
ترجمه فارسی عنوان
استخراج کیفیت کیفی با کیفیت بالا برای مقایسه توالی ترکیبی
موضوعات مرتبط
علوم زیستی و بیوفناوری علوم کشاورزی و بیولوژیک علوم کشاورزی و بیولوژیک (عمومی)
چکیده انگلیسی
The weighted Euclidean distance (D2) is one of the earliest dissimilarity measures used for alignment free comparison of biological sequences. This distance measure and its variants have been used in numerous applications due to its fast computation, and many variants of it have been subsequently introduced. The D2 distance measure is based on the count of k-words in the two sequences that are compared. Traditionally, all k-words are compared when computing the distance. In this paper we show that similar accuracy in sequence comparison can be achieved by using a selected subset of k-words. We introduce a term variance based quality measure for identifying the important k-words. We demonstrate the application of the proposed technique in phylogeny reconstruction and show that up to 99% of the k-words can be filtered out for certain datasets, resulting in faster sequence comparison. The paper also presents an exploratory analysis based evaluation of optimal k-word values and discusses the impact of using subsets of k-words in such optimal instances.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Theoretical Biology - Volume 358, 7 October 2014, Pages 31-51
نویسندگان
, , ,