Article ID Journal Published Year Pages File Type
4525677 Advances in Water Resources 2013 10 Pages PDF
Abstract

•EM method for multiply censored multivariate data is proposed.•GMM is used to approximate the underlying distribution of environmental data.•The proposed method is capable of dealing with both censored and uncensored data.•The method is successfully applied for analysis of real water quality data.

Environmental data are commonly constrained by a detection limit (DL) because of the restriction of experimental apparatus. In particular due to the changes of experimental units or assay methods, the observed data are often cut off by more than one DL. Measurements below the DLs are typically replaced by an arbitrary value such as zeros, half of DLs, or DLs for convenience of analysis. However, this method is widely considered unreliable and prone to bias. In contrast, maximum likelihood estimation (MLE) method for censored data has been developed for better performance and statistical justification. However, the existing MLE methods seldom address the multivariate context of censored environmental data especially for water quality. This paper proposes using a mixture model to flexibly approximate the underlying distribution of the observed data due to its good approximation capability and generation mechanism. In particular, Gaussian mixture model (GMM) is mainly focused in this study. To cope with the censored data with multiple DLs, an expectation–maximization (EM) algorithm in a multivariate setting is developed. The proposed statistical analysis approach is verified from both the simulated data and real water quality data.

Related Topics
Physical Sciences and Engineering Earth and Planetary Sciences Earth-Surface Processes
Authors
,