کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
1181343 | 1491547 | 2014 | 6 صفحه PDF | دانلود رایگان |
• Novel microenvironment and network features are used for the prediction of hot spot.
• Higher accuracy and Markovian correlation coefficient are obtained in Random Forest model.
• The difference of novel features between hot spots and non-hot spots is statistically significant.
• Hydrophilic amino acid residues are found to easily cluster in the vicinity of hot spots.
Hot spots residues in protein–protein interface play crucial roles in protein binding. In the present study, complex network method was applied to uncover influence of neighboring residues on hot spots and then several network and microenvironment features were designed to describe the diversity of environment of hot spots. After feature analysis by permutation importance in Random Forest (RF), an optimal 58-dimensional feature set including ten network and microenvironment features was selected and then applied to construct a Support Vector Machine (SVM) prediction model for hot spots. A satisfactory accuracy (ACC) value of 79.0% and a Mathew's correlation coefficient (MCC) value of 0.470 were obtained for independent test set. The novel network features and microenvironment features were proved to be promising in discovering hot spots in interfaces. A further microenvironment analysis was also performed. Amino acid residues directly contacting with hot spots in residue–residue interaction network exhibit significant importance for the microenvironment of hot spots. Amino acid alanine (A), aspartic acid (D), glycine (G), histidine (H), isoleucine (I), asparagine (N), serine (S) and tyrosine (Y) are more likely to occur in the vicinity of hot spots than in the vicinity of non-hot spots. These amino acid residues probably cluster together to construct a proper microenvironment for hot spots.
Journal: Chemometrics and Intelligent Laboratory Systems - Volume 131, 15 February 2014, Pages 16–21