کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
417113 | 681454 | 2009 | 8 صفحه PDF | دانلود رایگان |
A new procedure is proposed to balance type I and II errors in significance testing for differential expression of individual genes. Suppose that a collection, FkFk, of kk lists of selected genes is available, each of them approximating by their content the true set of differentially expressed genes. For example, such sets can be generated by a subsampling counterpart of the delete-dd-jackknife method controlling the per-comparison error rate for each subsample. A final list of candidate genes, denoted by S∗S∗, is composed in such a way that its contents be closest in some sense to all the sets thus generated. To measure “closeness” of gene lists, we introduce an asymmetric distance between sets with its asymmetry arising from a generally unequal assignment of the relative costs of type I and type II errors committed in the course of gene selection. The optimal set S∗S∗ is defined as a minimizer of the average asymmetric distance from an arbitrary set SS to all sets in the collection FkFk. The minimization problem can be solved explicitly, leading to a frequency criterion for the inclusion of each gene in the final set. The proposed method is tested by resampling from real microarray gene expression data with artificially introduced shifts in expression levels of pre-defined genes, thereby mimicking their differential expression.
Journal: Computational Statistics & Data Analysis - Volume 53, Issue 5, 15 March 2009, Pages 1622–1629