کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1138038 1489221 2006 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Gauss-integral based representation of protein structure for predicting the fold class from the sequence
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی
پیش نمایش صفحه اول مقاله
Gauss-integral based representation of protein structure for predicting the fold class from the sequence
چکیده انگلیسی

A representative subset of protein chains were selected from the CATH 2.4 database [C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton, CATH—a hierarchic classification of protein domain structures, Structure 5 (8) (1997) 1093–1108], and were used for training a feed-forward neural network in order to predict protein fold classes by using as input the dipeptide frequency matrix and as output a novel representation of the protein chains in R30R30 space, based on knot invariant values [P. Røgen, B. Fain, Automatic classification of protein structure by using Gauss integrals, Proceedings of the National Academy of Sciences of the United States of America 100 (1) (2003) 119–124; P. Røgen, H.G. Bohr, A new family of global protein shape descriptors, Mathematical Biosciences 182 (2) (2003) 167–181]. In the general case when excluding singletons (proteins representing a topology or a sequence homology as unique members of these sets), the success rates for the predictions were 77% for class level, 60% for architecture, and 48% for topology. The total number of fold classes that are included in the present data set (∼500) is ten times that which has been reported in earlier attempts, so this result represents an improvement on previous work (reporting on a few handpicked folds). Furthermore, distance analysis of the network outputs resulting from singletons shows that it is possible to detect novel topologies with very high confidence (∼85%), and the network can in these cases be used as a sorting mechanism that identifies sequences which might need special attention. Also, a direct measure of prediction confidence may be obtained from such distance analysis.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Mathematical and Computer Modelling - Volume 43, Issues 3–4, February 2006, Pages 401–412
نویسندگان
, , ,