Empirical likelihood confidence intervals for differences between two datasets with missing data

Article ID	Journal	Published Year	Pages	File Type
535142	Pattern Recognition Letters	2008	10 Pages	PDF

Abstract

Detecting differences between populations (or datasets) is an important research topic in machine learning, yet an common application means of evaluating, such as a new medical product by comparing with an old one. Previous researchers focus on change detection. In this paper, we measure the uncertainty of structural differences, such as mean and distribution function differences, between populations, using a confidence interval (CI), via an empirical likelihood approach. We present a statistically sound method for estimating CIs for differences between non-parametric populations with missing values, which are imputed by using simple random hot deck imputation method. We illustrate the power of CI estimation as a new machine learning technique for, such as, distinguishing spam from non-spam emails in spambase dataset downloaded from UCI.

Keywords

Missing data Empirical likelihood confidence interval Imputation