کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1172030 960741 2006 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Creating hierarchical models of protein families based on Expressed Sequence Tags: The “Sprockets” analysis pipeline
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
پیش نمایش صفحه اول مقاله
Creating hierarchical models of protein families based on Expressed Sequence Tags: The “Sprockets” analysis pipeline
چکیده انگلیسی
We have created an analysis pipeline called Sprockets, which can be used to classify proteins into various hierarchical “families”, and build searchable models of these families. The construction of these families is based on data from Expressed Sequence Tags (ESTs) and Coding DNA Sequences (CDSs), making Sprockets clusters especially suitable for studying gene families in organisms for which the completely sequenced genome does not (yet) exist. The pipeline consists of two main parts: pair-wise analysis and grouping of sequences with Z-score statistics, followed by hierarchical splitting of clusters into alignable protein families. Various computational and statistical techniques applied in Sprockets allow it to act like a massive and selective multiple sequence alignment engine for combining individual sequence collections and related public sequences. The end result is a database of gene Hidden Markov Models, each related to the other by three levels of similarity: secondary structure, function and evolutionary origin. For a sample 20,000 EST set from Lactuca spp., Sprockets provided a 9% improvement in mapping of function to unknown sequences over traditional pair-wise search methods and InterPro mapping.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Analytica Chimica Acta - Volume 564, Issue 1, 30 March 2006, Pages 123-132
نویسندگان
, , , , , ,