Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6869753 | Computational Statistics & Data Analysis | 2014 | 13 Pages |
Abstract
Dissemination of data with sensitive information has an implicit risk of unauthorized disclosure. Several masking methods have been developed in order to protect the data without the loss of too much information. One such method is the Post Randomization Method (PRAM) based on perturbations of a categorical variable according to a Markov probability transition matrix. The method has the drawback that it is difficult to find an optimal transition matrix to perform perturbations and maximize data utility. An evolutionary algorithm which generates an optimal probability transition matrix is proposed. Optimality is with respect to a pre-defined fitness function dependent on the aspects of the data that need to be preserved following perturbation. The algorithm embeds two properties: the invariance of the transition matrix to preserve marginal totals in expectation, and the control of diagonal probabilities which determine the amount of perturbation. Experimental results using a real data set are presented in order to illustrate and empirically evaluate the application of this algorithm.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Jordi Marés, Natalie Shlomo,