Application of amino acid distribution along the sequence for discriminating mesophilic and thermophilic proteins

Article ID	Journal	Published Year	Pages	File Type
36201	Process Biochemistry	2006	7 Pages	PDF

Abstract

In this work, we have systematically analyzed the distribution of two neighboring amino acids in the sequences of thermophilic and mesophilic proteins. We observed that the occurrence of EE, KK, RR, PP, KI, VV, VE, KE and VK in thermophilic proteins were significantly higher, while the occurrence of QQ, AA, EQ, LL, QA, QL, NN, KQ, QG, RQ, QT and AQ were significantly lower. The thermostable mechanism was studied and we thought that the dipeptide composition contained more information than amino acid composition. Based on the information of dipeptide composition, we have developed a statistical method for discriminating thermophilic and mesophilic proteins. The accuracy of our method for the training dataset was 86.3%. Furthermore, the accuracy of the method for another two independent testing datasets was 85.5 and 89.7%, respectively. The influence of some specific dipeptides on prediction accuracy was also discussed.

Keywords

Dipeptide composition Discrimination Amino acid composition Protein thermostability