Article ID Journal Published Year Pages File Type
416700 Computational Statistics & Data Analysis 2006 9 Pages PDF
Abstract

Genome-wide association studies are likely to be conducted in large scale in the near future. In such studies, searching over hundreds of thousands of markers for the few ones that are associated with disease brings out the multiple-hypothesis testing problem in its severe form. We explore, in a two-stage design, how the use of false discovery rate (FDR) can alleviate the burden of a prohibitively strict significance level for single marker tests and still control the number of false positive findings, when there is more than one causal variant. FDR is the expected proportion of false positives among all significant findings. It can be approximated by (1-p0p0)αα/[(1-p0p0)α+p0α+p0(1-ββ)], where p0p0 is the proportion of true causal markers, αα is the type I error rate and 1-ββ the power of a two-stage study. When 500,000 SNPs are genotyped in the first stage with fixed SNP array and the most significant SNPs are genotyped in the second stage with standard but 20 times more expensive high-throughput techniques, up to 20% savings in the minimum genotyping cost is achieved for p0p0 in the range of 10-510-5 to 5×10-45×10-4 and FDR in the range of 0.05 to 0.7, compared to when Bonferroni-corrected significance level is used. In terms of sample size, the saving is up to 60%. However, these savings come at a cost of more false positive findings.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,