Article ID Journal Published Year Pages File Type
554759 Decision Support Systems 2012 11 Pages PDF
Abstract

To develop a data mining approach for a deception application, data collection costs can be prohibitive because both deceptive data and truthful data are necessary to be collected. To reduce data collection costs, artificially generated deception data can be used, but the impact of using artificially generated deception data is not well understood. To study the relationship between artificial and real deception, this paper presents an experimental comparison using a novel deception generation model. The deception and truth data were collected from financial aid applications, a document centric area with limited resources for verification. The data collection provided a unique data set containing truth, natural deception, and boosted deception. To simulate deception, the Application Deception Model was developed to generate artificial deception in different deception scenarios. To study differences between artificial and real deception, an experiment was performed using deception level and data generation method as factors and directed distance and outlier score as outcome variables. Our results provided evidence of a reasonable similarity between artificial and real deception, suggesting the possibility of using artificially generated deception to reduce the costs associated with obtaining training data.

► This paper develops a new data generation model for document deception. ► This paper presents an experiment comparing the similarity of real deception and artificially generated noise and deception. ► This paper provides evidence that artificially generated deception is similar to real deception.

Related Topics
Physical Sciences and Engineering Computer Science Information Systems
Authors
, ,