کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
1146547 | 957517 | 2010 | 15 صفحه PDF | دانلود رایگان |

For qualitative data models, Gini–Simpson index and Shannon entropy are commonly used for statistical analysis. In the context of high-dimensional low-sample size (HDLSS) categorical models, abundant in genomics and bioinformatics, the Gini–Simpson index, as extended to Hamming distance in a pseudo-marginal setup, facilitates drawing suitable statistical conclusions. Under Lorenz ordering it is shown that Shannon entropy and its multivariate analogues proposed here appear to be more informative than the Gini–Simpson index. The nested subset monotonicity prospect along with subgroup decomposability of some proposed measures are exploited. The usual jackknifing (or bootstrapping) methods may not work out well for HDLSS constrained models. Hence, we consider a permutation method incorporating the union–intersection (UI) principle and Chen–Stein Theorem to formulate suitable statistical hypothesis testing procedures for gene classification. Some applications are included as illustration.
Journal: Journal of Multivariate Analysis - Volume 101, Issue 7, August 2010, Pages 1559–1573