کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
416666 | 681393 | 2006 | 8 صفحه PDF | دانلود رایگان |

Protein sequence alignment may be viewed as either a classification or a multiple hypothesis testing problem. Whereas the type one error of a method is often studied for randomly generated sequences, the power is best investigated based on real protein sequences. The SCOP data base and its protein classification is used to investigate both the power and the type one error of sequence alignment as provided by BLAST. The focus is on the multiple testing case when more than one scoring matrix is used. It is demonstrated that a multiple testing correction needs to be applied in order to control the number of false positives while using more than one scoring matrix. It is also shown that a proper search procedure based on multiple scoring matrices detects slightly fewer homologous sequences present in the SCOP data base than the matrix BLOSUM62 itself, while giving the opportunity of detecting a wider variety of homologous types.
Journal: Computational Statistics & Data Analysis - Volume 51, Issue 3, 1 December 2006, Pages 1656–1663