Article ID Journal Published Year Pages File Type
36201 Process Biochemistry 2006 7 Pages PDF
Abstract

In this work, we have systematically analyzed the distribution of two neighboring amino acids in the sequences of thermophilic and mesophilic proteins. We observed that the occurrence of EE, KK, RR, PP, KI, VV, VE, KE and VK in thermophilic proteins were significantly higher, while the occurrence of QQ, AA, EQ, LL, QA, QL, NN, KQ, QG, RQ, QT and AQ were significantly lower. The thermostable mechanism was studied and we thought that the dipeptide composition contained more information than amino acid composition. Based on the information of dipeptide composition, we have developed a statistical method for discriminating thermophilic and mesophilic proteins. The accuracy of our method for the training dataset was 86.3%. Furthermore, the accuracy of the method for another two independent testing datasets was 85.5 and 89.7%, respectively. The influence of some specific dipeptides on prediction accuracy was also discussed.

Related Topics
Physical Sciences and Engineering Chemical Engineering Bioengineering
Authors
, ,