Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
397035 | International Journal of Approximate Reasoning | 2014 | 11 Pages |
•A probabilistic inference is developed for large-scale multinomial distributions.•The method improves the Dempster–Shafer theory of belief.•The method has a desirable long-run frequency property.•The inference method is applied in a genome-wide association study.
Statistical analysis of multinomial counts with a large number K of categories and a small number n of sample size is challenging to both frequentist and Bayesian methods and requires thinking about statistical inference at a very fundamental level. Following the framework of Dempster–Shafer theory of belief functions, a probabilistic inferential model is proposed for this “large K and small n” problem. The inferential model produces a probability triplet (p,q,r) for an assertion conditional on observed data. The probabilities p and q are for and against the truth of the assertion, whereas r=1−p−q is the remaining probability called the probability of “donʼt know”. The new inference method is applied in a genome-wide association study with very high dimensional count data, to identify association between genetic variants to the disease Rheumatoid Arthritis.