A family of block-wise one-factor distributions for modeling high-dimensional binary data

Article ID	Journal	Published Year	Pages	File Type
4949262	Computational Statistics & Data Analysis	2017	27 Pages	PDF

Abstract

A new family of one-factor distributions for modeling high-dimensional binary data is introduced. The model provides an explicit probability for each event, thus avoiding the numeric approximations often made by existing methods. Model interpretation is easy, because each variable is described by two continuous parameters (corresponding to the marginal probability and to the strength of dependency with the other variables) and by one binary parameter (defining if the dependencies are positive or negative). This model is extended by splitting the variables into independent blocks, where each block follows the new one-factor distribution. Finally, a parsimonious version of the model, forcing some equality constraints between the dependency parameters, is proposed. Parameter estimation is carried out by an inference margin procedure, where the second step is achieved by an expectation-maximization algorithm. Model selection is performed by a deterministic approach, which strongly reduces the number of competing models. This consistent approach uses a hierarchical ascendant classification of the variables which selects a narrow subset of models. This selection is based on the empirical version of Cramer's V. The new model is evaluated on numerical experiments and on a real data set. The procedure is implemented in the R package MvBinary.

Keywords

EM algorithm Model selection High-dimensional data Binary data