کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382491 660765 2014 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Predicting the protein solubility by integrating chaos games representation and entropy in information theory
ترجمه فارسی عنوان
پیش بینی حلالیت پروتئین از طریق ادغام نمایندگی های بازی های هرج و مرج و آنتروپی در تئوری اطلاعات
کلمات کلیدی
حلالیت پروتئین، ترکیب اسید آمینه اسید، آنتروپی در نظریه اطلاعات، نمایندگی بازی هرج و مرج، پشتیبانی از ماشین بردار
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• The concept of the upgrade chaos game representation is proposed to predict protein solubility.
• We integrated the upgrade chaos game representation and entropies in information theory.
• The results demonstrate that the sequence-based method is an effective way.
• Among these chaos game representations, triangle polygon CGR perform best.

Protein solubility is a prerequisite for many structural, functional studies. Predicting the propensity of a protein to be soluble or to form inclusion body is a challenging and crucial problem. In order to formulate the protein samples which can reflect the intrinsic correlation with protein solubility, triangle, quadrangle and 12-vertex polygon CGR, the concept of entropy in information theory, together with amino acid and dipeptide compositions are applied based on a different mode of pseudo amino acid composition (PseAAC). The mathematical expressions involving with seven CGR methods and amino acid, dipeptide compositions with their corresponding entropies are evaluated with 10-fold cross validation and re-substitution test. The numerical results confirm that the introduction of the entropy can significantly improve the performance of the classifiers. Triangle CGR method surpass the two other CGR methods in classifier construction. It can provide complementary sequence-order information on the basis of dipeptide composition. The optimal mathematical expression is dipeptide composition, triangle CGR and their entropies. With the 2-level triangle polygon CGR + dipeptide composition together with their corresponding entropies as the mathematical feature, the classifier achieved the best accuracy 88.45% and MCC achieved 0.7588 in 10-fold cross validation test. In the re-substitution test, the 3-level triangle polygon CGR, dipeptide composition and their entropies perform best, its accuracy was 92.38%, MCC achieved 0.8387.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 41, Issue 4, Part 2, March 2014, Pages 1672–1679
نویسندگان
, , , , ,