Comparison of non-sequential sets of protein residues

Article ID	Journal	Published Year	Pages	File Type
14928	Computational Biology and Chemistry	2016	16 Pages	PDF

Abstract

•The vast majority of binding sites for small ligands are shown to be set of sparsely distributed amino-acids.•A method for the comparison of non-sequential protein segments based on a genetic algorithm is introduced.•The sequence-independent method presented is shown to be better suited than sequence based approaches for comparing ligand-binding sites.

A methodology for performing sequence-free comparison of functional sites in protein structures is introduced. The method is based on a new notion of similarity among superimposed groups of amino acid residues that evaluates both geometry and physico-chemical properties. The method is specifically designed to handle disconnected and sparsely distributed sets of residues. A genetic algorithm is employed to find the superimposition of protein segments that maximizes their similarity. The method was evaluated by performing an all-to-all comparison on two separate sets of ligand-binding sites, comprising 47 protein-FAD (Flavin-Adenine Dinucleotide) and 64 protein-NAD (Nicotinamide-Adenine Dinucleotide) complexes, and comparing the results with those of an existing sequence-based structural alignment tool (TM-Align). The quality of the two methodologies is judged by the methods’ capacity to, among other, correctly predict the similarities in the protein-ligand contact patterns of each pair of binding sites. The results show that using a sequence-free method significantly improves over the sequence-based one, resulting in 23 significant binding-site homologies being detected by the new method but ignored by the sequence-based one.

Keywords

FAD Structural similarity NAD Structural alignment