Power analysis of database search using multiple scoring matrices

Article ID	Journal	Published Year	Pages	File Type
416666	Computational Statistics & Data Analysis	2006	8 Pages	PDF

Abstract

Protein sequence alignment may be viewed as either a classification or a multiple hypothesis testing problem. Whereas the type one error of a method is often studied for randomly generated sequences, the power is best investigated based on real protein sequences. The SCOP data base and its protein classification is used to investigate both the power and the type one error of sequence alignment as provided by BLAST. The focus is on the multiple testing case when more than one scoring matrix is used. It is demonstrated that a multiple testing correction needs to be applied in order to control the number of false positives while using more than one scoring matrix. It is also shown that a proper search procedure based on multiple scoring matrices detects slightly fewer homologous sequences present in the SCOP data base than the matrix BLOSUM62 itself, while giving the opportunity of detecting a wider variety of homologous types.

Keywords

Power analysis Multiple testing Copula functions