Article ID Journal Published Year Pages File Type
4497725 Journal of Theoretical Biology 2010 9 Pages PDF
Abstract
In principle, structural information of protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements. Although some ab initio methods for protein structure prediction have been reported, the long-range interactions required to accurately predict tertiary structures of β-sheet containing proteins are still difficult to simulate. To remedy this problem and facilitate de novo prediction of β-sheet containing protein structures, we developed a support vector machine (SVM) approach that classified parallel and antiparallel orientation of β-strands by using the information of interstrand amino acid pairing preferences. Based on a second-order statistics on the relative frequencies of each possible interstrand amino acid pair, we defined an average amino acid pairing encoding matrix (APEM) for encoding β-strands as input in the prediction model. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.71 have been achieved through 7-fold cross-validation on a non-redundant protein dataset from PISCES. Although several issues still remain to be studied, the method presented here to some extent could indicate the important contribution of the amino acid pairs to the β-strand orientation, and provide a possible way to further be combined with other algorithms making a full 'identification' of β-strands.
Related Topics
Life Sciences Agricultural and Biological Sciences Agricultural and Biological Sciences (General)
Authors
, , , , ,