Estimate-based goodness-of-fit test for large sparse multinomial distributions

Article ID	Journal	Published Year	Pages	File Type
417837	Computational Statistics & Data Analysis	2009	10 Pages	PDF

Abstract

The Pearson’s chi-squared statistic (X2X2) does not in general follow a chi-square distribution when it is used for goodness-of-fit testing for a multinomial distribution based on sparse contingency table data. We explore properties of [Zelterman, D., 1987. Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. 82 (398), 624–629] D2D2 statistic and compare them with those of X2X2 and compare the power of goodness-of-fit test among the tests using D2D2, X2X2, and the statistic (LrLr) which is proposed by [Maydeu-Olivares, A., Joe, H., 2005. Limited- and full-information estimation and goodness-of-fit testing in 2n2n contingency tables: A unified framework. J. Amer. Statist. Assoc. 100 (471), 1009–1020] when the given contingency table is very sparse. We show that the variance of D2D2 is not larger than the variance of X2X2 under null hypotheses where all the cell probabilities are positive, that the distribution of D2D2 becomes more skewed as the multinomial distribution becomes more asymmetric and sparse, and that, as for the LrLr statistic, the power of the goodness-of-fit testing depends on the models which are selected for the testing. A simulation experiment strongly recommends to use both D2D2 and LrLr for goodness-of-fit testing with large sparse contingency table data.