Normalized global alignment for protein sequences

Article ID	Journal	Published Year	Pages	File Type
4496849	Journal of Theoretical Biology	2011	7 Pages	PDF

Abstract

Global alignment is used to compare proteins in different fields, for example in phylogenetic research. In order to reduce the length and composition dependence of global alignment scores, Z-score is computed with a Monte-Carlo algorithm. This technique requires a great number of sequence alignments on shuffled sequences, leading to a high computational cost. In this work, a normalized global alignment score is introduced in order to correct the length dependence of global alignments. This score is defined as the best ratio between the score of an alignment and its length, and an algorithm to compute it based on fractional programming is implemented. The properties and effectiveness of normalized global alignment applied to protein comparison are analyzed.Experiments with proteins selected from the SCOP ASTRAL database were run to study relationship of normalized global alignment with Z-score and performance in homologous detection. Results show that normalized global alignment has a computational cost equivalent to 2.5 Needleman-Wunsch runs and a linear relationship with Z-score. This linearity allows us to use normalized global alignment as a cheap substitute to a computationally expensive Z-score. Experiments show that normalized global alignment improves the ability to identify homologous proteins.Software used to compute normalized global alignments is available from http://www3.uji.es/∼peris/nga.

► A normalized global alignment (NGA) algorithm is introduced and implemented. ► A linear relationship between NGA scores and Z-scores is found. ► NGA outperforms Z-score in homologous detection at a lower computational cost.

Keywords

Fractional programming Database search Normalization Global alignment Homologous proteins