Article ID Journal Published Year Pages File Type
4949262 Computational Statistics & Data Analysis 2017 27 Pages PDF
Abstract
A new family of one-factor distributions for modeling high-dimensional binary data is introduced. The model provides an explicit probability for each event, thus avoiding the numeric approximations often made by existing methods. Model interpretation is easy, because each variable is described by two continuous parameters (corresponding to the marginal probability and to the strength of dependency with the other variables) and by one binary parameter (defining if the dependencies are positive or negative). This model is extended by splitting the variables into independent blocks, where each block follows the new one-factor distribution. Finally, a parsimonious version of the model, forcing some equality constraints between the dependency parameters, is proposed. Parameter estimation is carried out by an inference margin procedure, where the second step is achieved by an expectation-maximization algorithm. Model selection is performed by a deterministic approach, which strongly reduces the number of competing models. This consistent approach uses a hierarchical ascendant classification of the variables which selects a narrow subset of models. This selection is based on the empirical version of Cramer's V. The new model is evaluated on numerical experiments and on a real data set. The procedure is implemented in the R package MvBinary.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,