Article ID Journal Published Year Pages File Type
417338 Computational Statistics & Data Analysis 2008 11 Pages PDF
Abstract

Multiple imputation under the multivariate normality assumption has often been considered a workable model-based approach in dealing with incomplete continuous data. A situation where the measurements are taken on a continuous scale with an eventual interest in ordinalized versions via threshold concept is commonly encountered in applied research, especially in medical and social sciences. In practice, researchers ordinarily impute missing values for continuous outcomes under a Gaussian imputation model, and then ordinalize them via pre-specified cutoff points. An alternate strategy is creating multiply imputed data sets after ordinalization under a log-linear imputation model that uses a saturated multinomial structure. In this work, the performances of the two imputation methods were examined on a fairly broad range of simulated incomplete data sets that exhibit varying distributional characteristics such as skewness and multimodality. Behavior of efficiency and accuracy measures was investigated to determine the degree to which the procedures work appositely. The conclusion drawn is that ordinalization before carrying out a log-linear imputation should be the preferred procedure except for a few special cases. It is recommended that researchers use the less common second strategy whenever the interest centers on ordinal quantities that are obtained through underlying continuous measurements. This postulate is probably due to the transformation of non-Gaussian features into better-behaving categorical trends in this particular missing-data environment. This premise preponderates the factual argument that continuous variables intrinsically convey more information, leading to a counter-intuitive, but potentially beneficial result for practitioners.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
,