کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
2833745 1570801 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation
موضوعات مرتبط
علوم زیستی و بیوفناوری علوم کشاورزی و بیولوژیک بوم شناسی، تکامل، رفتار و سامانه شناسی
پیش نمایش صفحه اول مقاله
Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation
چکیده انگلیسی


• A new alignment-free proteome based method for phylogenetic tree construction is proposed.
• We convert the whole proteome sequences into transition matrices of a higher order Markov model.
• One-dimensional CGR and the linked list are used to reduce the problem of large memory storage.
• A distance measure based on the angle between two feature vectors is used to refer the phylogenetic distance.
• Our results on two data sets demonstrate that the new method is useful and efficient.

Traditional methods for sequence comparison and phylogeny reconstruction rely on pair wise and multiple sequence alignments. But alignment could not be directly applied to whole genome/proteome comparison and phylogenomic studies due to their high computational complexity. Hence alignment-free methods became popular in recent years. Here we propose a fast alignment-free method for whole genome/proteome comparison and phylogeny reconstruction using higher order Markov model and chaos game representation. In the present method, we use the transition matrices of higher order Markov models to characterize amino acid or DNA sequences for their comparison. The order of the Markov model is uniquely identified by maximizing the average Shannon entropy of conditional probability distributions. Using one-dimensional chaos game representation and linked list, this method can reduce large memory and time consumption which is due to the large-scale conditional probability distributions. To illustrate the effectiveness of our method, we employ it for fast phylogeny reconstruction based on genome/proteome sequences of two species data sets used in previous published papers. Our results demonstrate that the present method is useful and efficient.Availability and implementation: The source codes for our algorithm to get the distance matrix and genome/proteome sequences can be downloaded from ftp://121.199.20.25/. The software Phylip and EvolView we used to construct phylogenetic trees can be referred from their websites.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Molecular Phylogenetics and Evolution - Volume 96, March 2016, Pages 102–111
نویسندگان
, , ,