Article ID Journal Published Year Pages File Type
491263 Procedia Technology 2013 5 Pages PDF
Abstract

Protein-Protein-Interactions (PPIs) play the most important roles in most (if not all) of the biological processes. A few such examples include hormone–receptor binding, signal transduction, chaperone activity, antigen-antibody interactions. The disruptions of PPIs may therefore lead to the development of human inherited diseases. There are different analytical techniques to identify amino acid residues in protein interfaces. But they are time consuming, labour intensive and above all very expensive. As an alternative approach to the analytical methods, we have tried to develop machine learning tools to differentiate between protein interface and non-interface amino acid residues. We used sequence- and structure-based features derived from a set of protein hetero-complex structure files from the Protein Data Bank (PDB). We have built supervised predictors based on Random Forests (RF) and Support Vector Machines (SVMs). We have evaluated them with 10-fold cross-validations. Both of our sequence and structure based RF predictors performed better than SVM based ones. The most predictive sequence- and structure- based features are the attributes which measure sequence conservation at a specified amino acid residue and various other measurements of the amino acid residue's neighbouring charge distributions. Our sequence- and structure-based RF classifiers have been validated by evaluating them against the protein complexes with experimentally proven interaction sites. Our predictors are found to detect the protein interface residues in practice.Availability: http://www.sblest.org/ppi

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)