On imputing continuous data when the eventual interest pertains to ordinalized outcomes via threshold concept

Article ID	Journal	Published Year	Pages	File Type
417338	Computational Statistics & Data Analysis	2008	11 Pages	PDF

Abstract

Multiple imputation under the multivariate normality assumption has often been considered a workable model-based approach in dealing with incomplete continuous data. A situation where the measurements are taken on a continuous scale with an eventual interest in ordinalized versions via threshold concept is commonly encountered in applied research, especially in medical and social sciences. In practice, researchers ordinarily impute missing values for continuous outcomes under a Gaussian imputation model, and then ordinalize them via pre-specified cutoff points. An alternate strategy is creating multiply imputed data sets after ordinalization under a log-linear imputation model that uses a saturated multinomial structure. In this work, the performances of the two imputation methods were examined on a fairly broad range of simulated incomplete data sets that exhibit varying distributional characteristics such as skewness and multimodality. Behavior of efficiency and accuracy measures was investigated to determine the degree to which the procedures work appositely. The conclusion drawn is that ordinalization before carrying out a log-linear imputation should be the preferred procedure except for a few special cases. It is recommended that researchers use the less common second strategy whenever the interest centers on ordinal quantities that are obtained through underlying continuous measurements. This postulate is probably due to the transformation of non-Gaussian features into better-behaving categorical trends in this particular missing-data environment. This premise preponderates the factual argument that continuous variables intrinsically convey more information, leading to a counter-intuitive, but potentially beneficial result for practitioners.

Keywords

Multivariate normality Multiple imputation Log-linear models Multimodality Skewness