Article ID Journal Published Year Pages File Type
416666 Computational Statistics & Data Analysis 2006 8 Pages PDF
Abstract

Protein sequence alignment may be viewed as either a classification or a multiple hypothesis testing problem. Whereas the type one error of a method is often studied for randomly generated sequences, the power is best investigated based on real protein sequences. The SCOP data base and its protein classification is used to investigate both the power and the type one error of sequence alignment as provided by BLAST. The focus is on the multiple testing case when more than one scoring matrix is used. It is demonstrated that a multiple testing correction needs to be applied in order to control the number of false positives while using more than one scoring matrix. It is also shown that a proper search procedure based on multiple scoring matrices detects slightly fewer homologous sequences present in the SCOP data base than the matrix BLOSUM62 itself, while giving the opportunity of detecting a wider variety of homologous types.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,