کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4496350 1623878 2013 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Using the concept of Chou's pseudo amino acid composition to predict protein solubility: An approach with entropies in information theory
موضوعات مرتبط
علوم زیستی و بیوفناوری علوم کشاورزی و بیولوژیک علوم کشاورزی و بیولوژیک (عمومی)
پیش نمایش صفحه اول مقاله
Using the concept of Chou's pseudo amino acid composition to predict protein solubility: An approach with entropies in information theory
چکیده انگلیسی


• We predict the protein solubility with entropies in information theory.
• We conduct the experiments involving nine different feature vector combinations.
• The results demonstrate that the sequence-based method is an effective way.
• The introduction of the entropy can improve the performance of the method.
• The concept of entropy can reveal the essential information hidden in sequences.

Protein solubility plays a major role and has strong implication in the proteomics. Predicting the propensity of a protein to be soluble or to form inclusion body is a fundamental and not fairly resolved problem. In order to predict the protein solubility, almost 10,000 protein sequences were downloaded from NCBI. Then the sequences were eliminated for the high homologous similarity by CD-HIT. Thus, there were 5692 sequences remained. Based on protein sequences, amino acid and dipeptide compositions were generally extracted to predict protein solubility. In this study, the entropy in information theory was introduced as another predictive factor in the model. Experiments involving nine different feature vector combinations, including the above-mentioned three kinds of factors, were conducted with support vector machines (SVMs) as prediction engine. Each combination was evaluated by re-substitution test and 10-fold cross-validation test. According to the evaluation results, the accuracies and Matthew's Correlation Coefficient (MCC) values were boosted by the introduction of the entropy. The best combination was the one with amino acid, dipeptide compositions and their entropies. Its accuracy reached 90.34% and Matthew's Correlation Coefficient (MCC) value was 0.7494 in re-substitution test, while 88.12% and 0.7945 respectively for 10-fold cross-validation. In conclusion, the introduction of the entropy significantly improved the performance of the predictive method.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Theoretical Biology - Volume 332, 7 September 2013, Pages 211–217
نویسندگان
, , , , , , , , ,