Gauss-integral based representation of protein structure for predicting the fold class from the sequence

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
1138038	1489221	2006	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی

پیش نمایش صفحه اول مقاله

Gauss-integral based representation of protein structure for predicting the fold class from the sequence

چکیده انگلیسی

A representative subset of protein chains were selected from the CATH 2.4 database [C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton, CATH—a hierarchic classification of protein domain structures, Structure 5 (8) (1997) 1093–1108], and were used for training a feed-forward neural network in order to predict protein fold classes by using as input the dipeptide frequency matrix and as output a novel representation of the protein chains in R30R30 space, based on knot invariant values [P. Røgen, B. Fain, Automatic classification of protein structure by using Gauss integrals, Proceedings of the National Academy of Sciences of the United States of America 100 (1) (2003) 119–124; P. Røgen, H.G. Bohr, A new family of global protein shape descriptors, Mathematical Biosciences 182 (2) (2003) 167–181]. In the general case when excluding singletons (proteins representing a topology or a sequence homology as unique members of these sets), the success rates for the predictions were 77% for class level, 60% for architecture, and 48% for topology. The total number of fold classes that are included in the present data set (∼500) is ten times that which has been reported in earlier attempts, so this result represents an improvement on previous work (reporting on a few handpicked folds). Furthermore, distance analysis of the network outputs resulting from singletons shows that it is possible to detect novel topologies with very high confidence (∼85%), and the network can in these cases be used as a sorting mechanism that identifies sequences which might need special attention. Also, a direct measure of prediction confidence may be obtained from such distance analysis.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Mathematical and Computer Modelling - Volume 43, Issues 3–4, February 2006, Pages 401–412

نویسندگان

Bjørn G. Nielsen, Peter Røgen, Henrik G. Bohr,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Gauss-integral based representation of protein structure for predicting the fold class from the sequence

دسترسی سریع

ارتباط

English Website