Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4498386	1318978	2008	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Protein secondary structures - ساختارهای ثانویه پروتئین Protein models - مدل های پروتئینی Star graph - گراف ستاره

موضوعات مرتبط

علوم زیستی و بیوفناوری علوم کشاورزی و بیولوژیک علوم کشاورزی و بیولوژیک (عمومی)

پیش نمایش صفحه اول مقاله

Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices

چکیده انگلیسی

The huge amount of new proteins that need a fast enzymatic activity characterization creates demands of protein QSAR theoretical models. The protein parameters that can be used for an enzyme/non-enzyme classification includes the simpler indices such as composition, sequence and connectivity, also called topological indices (TIs) and the computationally expensive 3D descriptors. A comparison of the 3D versus lower dimension indices has not been reported with respect to the power of discrimination of proteins according to enzyme action. A set of 966 proteins (enzymes and non-enzymes) whose structural characteristics are provided by PDB/DSSP files was analyzed with Python/Biopython scripts, STATISTICA and Weka. The list of indices includes, but it is not restricted to pure composition indices (residue fractions), DSSP secondary structure protein composition and 3D indices (surface and access). We also used mixed indices such as composition-sequence indices (Chou's pseudo-amino acid compositions or coupling numbers), 3D-composition (surface fractions) and DSSP secondary structure amino acid composition/propensities (obtained with our Prot-2S Web tool). In addition, we extend and test for the first time several classic TIs for the Randic's protein sequence Star graphs using our Sequence to Star Graph (S2SG) Python application. All the indices were processed with general discriminant analysis models (GDA), neural networks (NN) and machine learning (ML) methods and the results are presented versus complexity, average of Shannon's information entropy (Sh) and data/method type. This study compares for the first time all these classes of indices to assess the ratios between model accuracy and indices/model complexity in enzyme/non-enzyme discrimination. The use of different methods and complexity of data shows that one cannot establish a direct relation between the complexity and the accuracy of the model.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Theoretical Biology - Volume 254, Issue 2, 21 September 2008, Pages 476–482

نویسندگان

Cristian Robert Munteanu, Humberto González-Díaz, Alexandre L. Magalhães,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices

دسترسی سریع

ارتباط

English Website