کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
2817388 | 1159985 | 2013 | 6 صفحه PDF | دانلود رایگان |
K-mer-based approach has been widely used in similarity analyses so as to discover similarity/dissimilarity among different biological sequences. In this study, we have improved the traditional K-mer method, and introduce a segmented K-mer approach (s-K-mer). After each primary sequence is divided into several segments, we simultaneously transform all these segments into corresponding K-mer-based vectors. In this approach, it is vital how to determine the optimal combination of distance metric with the number of K and the number of segments, i.e., (K⁎, s⁎, and d⁎). Based on the cascaded feature vectors transformed from s⁎ segmented sequences, we analyze 34 mammalian genome sequences using the proposed s-K-mer approach. Meanwhile, we compare the results of s-K-mer with those of traditional K-mer. The contrastive analysis results demonstrate that s-K-mer approach outperforms the traditionally K-mer method on similarity analysis among different species.
Figure optionsDownload high-quality image (263 K)Download as PowerPoint slideHighlights
► Transform each genome sequence into a K-mer-based vector (F).
► Optimize upon all the vectors F to obtain the optimal K* for similarity analysis.
► Propose an optimizing model, i.e. segmented K-mer, to improve the performance.
► Results demonstrate the validity of our approach (s-K-mer).
Journal: Gene - Volume 518, Issue 2, 15 April 2013, Pages 419–424