Comparison of imputation methods for discriminant analysis with strategically hidden data

Article ID	Journal	Published Year	Pages	File Type
479204	European Journal of Operational Research	2016	9 Pages	PDF

Abstract

•We study the methods dealing with strategically missing data.•We show the properties of methods as the sample size gets arbitrarily large.•We analyze how sample size affects the performance of these methods.•We conduct an empirical study to verify our theoretical results.

In many situations, data may be selectively presented by data providers to achieve desirable but undeserved decision outcomes from decision makers. Decisions taken without considering strategic information revelation might be biased. We revisit and study the properties of two methods handling strategically missing data in a classification context. The asymptotic analysis suggests that when the training sets are sufficiently large these methods outperform the conventional methods handling missing data that do not consider strategic motivations of agents (e.g., Average method and Similarity method). Scale-up experiments support the theoretical findings and show that as the training size increases the misclassification rates of those methods decrease. We show that sampling can be used to efficiently identify sufficient information for the imputation methods to treat strategically missing data.

Keywords

Information disclosure Analytics Missing data Information theory Sampling