کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
382491 | 660765 | 2014 | 8 صفحه PDF | دانلود رایگان |
• The concept of the upgrade chaos game representation is proposed to predict protein solubility.
• We integrated the upgrade chaos game representation and entropies in information theory.
• The results demonstrate that the sequence-based method is an effective way.
• Among these chaos game representations, triangle polygon CGR perform best.
Protein solubility is a prerequisite for many structural, functional studies. Predicting the propensity of a protein to be soluble or to form inclusion body is a challenging and crucial problem. In order to formulate the protein samples which can reflect the intrinsic correlation with protein solubility, triangle, quadrangle and 12-vertex polygon CGR, the concept of entropy in information theory, together with amino acid and dipeptide compositions are applied based on a different mode of pseudo amino acid composition (PseAAC). The mathematical expressions involving with seven CGR methods and amino acid, dipeptide compositions with their corresponding entropies are evaluated with 10-fold cross validation and re-substitution test. The numerical results confirm that the introduction of the entropy can significantly improve the performance of the classifiers. Triangle CGR method surpass the two other CGR methods in classifier construction. It can provide complementary sequence-order information on the basis of dipeptide composition. The optimal mathematical expression is dipeptide composition, triangle CGR and their entropies. With the 2-level triangle polygon CGR + dipeptide composition together with their corresponding entropies as the mathematical feature, the classifier achieved the best accuracy 88.45% and MCC achieved 0.7588 in 10-fold cross validation test. In the re-substitution test, the 3-level triangle polygon CGR, dipeptide composition and their entropies perform best, its accuracy was 92.38%, MCC achieved 0.8387.
Journal: Expert Systems with Applications - Volume 41, Issue 4, Part 2, March 2014, Pages 1672–1679