Article ID Journal Published Year Pages File Type
505935 Computers in Biology and Medicine 2008 6 Pages PDF
Abstract

A methodology for testing the correlation between the sequence and structure distances of proteins is proposed. Structure distances were derived by applying a forward growing classification tree algorithm on defined physico-chemical and geometrical properties of the structures. The structure distance for every pair of proteins was defined as the number of intermediate nodes in the tree. Sequence distances were derived using pairwise sequence alignment. Then, correlation between sequence distance matrix and sequence distance matrix was tested using a Monte Carlo permutation test. The results were compared to those when the double dynamic structure alignment method (SSAP) was applied. The methodology was applied to a data set of 74 proteins belonging to 14 families. The classification tree was able to identify the protein families (the misclassification rate was R=1.4%R=1.4%) and a 74×7474×74 structure distance matrix was produced. For every pair of protein sequences a dissimilarity score was recorded and a sequence distance matrix was produced. The Monte Carlo permutation produced a correlation coefficient r=0.403r=0.403 (P<0.001P<0.001). The SSAP method produced similar results. The proposed methodology may assist in assessing whether protein sequence distances can be predictors of protein structure distances.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
,