Article ID Journal Published Year Pages File Type
1145703 Journal of Multivariate Analysis 2014 18 Pages PDF
Abstract

Normal-distribution-based maximum likelihood (NML) is most widely used for missing data analysis although real data seldom follow a normal distribution. When missing values are missing at random (MAR), recent results indicate that NML estimates (NMLEs) are still consistent for nonnormally distributed populations as long as the variables are linearly related. However, NMLEs are generally not consistent when the variables are nonlinearly related in the population. Similarly, NMLEs are generally not consistent when data are missing not at random (MNAR). It is well-known that including proper auxiliary variables mitigates the bias in MLEs caused by MNAR mechanism. With nonlinear relationships underlying the manifest variables and under MAR mechanism, the article contains a theoretical result showing that NMLEs are still consistent when proper nonlinear functions of the observed variables are included as auxiliary variables. Empirical results indicate that including auxiliary variables reduces bias in the estimates, but may also increase their standard errors substantially when sample size is small and the proportion of missing data is not trivial. Empirical results also imply that bias in NMLEs due to a nonnormally distributed population and MAR mechanism can be considerably greater when compared to bias caused by MNAR mechanism with a normally distributed population. How to select auxiliary variables in practice is also discussed.

Related Topics
Physical Sciences and Engineering Mathematics Numerical Analysis
Authors
, ,