کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425340 685723 2007 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The configuration space of homologous proteins: A theoretical and practical framework to reduce the diversity of the protein sequence space after massive all-by-all sequence comparisons
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
The configuration space of homologous proteins: A theoretical and practical framework to reduce the diversity of the protein sequence space after massive all-by-all sequence comparisons
چکیده انگلیسی

Most of the millions of virtual protein sequences deduced from genomic DNA, and the millions to come, will not be experimentally confirmed, neither their function directly analyzed. The exploration of the majority of the protein space relies on our ability to extrapolate the portion of knowledge on characterized sequences to unknown sequences. In this paper we analyzed the large scale comparisons of hundreds of thousands of protein sequences that have been previously carried out using the power of supercomputers or grid frameworks. Following these comparisons, pragmatic rules were used to reduce protein diversity, but none was based on a rigorous and robust framework. We examined how projection of sequences in the configuration space of homologous proteins (CSHP) could help in providing a theoretically robust and long-term practical solution to help organize the protein space. The CSHP can be constructed from the output of any all-by-all pair-wise comparison in which Z-values were computed after Monte Carlo simulations. Reduction of protein diversity can be carried out according to an evolutionary model raising consistent phylogenetic clusters. Projection in the CSHP can be easily updated after sequence database updates, and the accuracy of the phylogenetic topology can be upgraded by improving sub-models. Clusters of homologous proteins can be represented as phylogenetic trees (TULIP trees). In this paper, we showed that the CSHP projection can be used to process the outputs of previous massive comparison projects based on Z-value statistics, given minor corrections for uncollected low values and we propose guidelines for future generations of massive protein sequence comparison projects.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 23, Issue 3, March 2007, Pages 410–427
نویسندگان
, , , ,