Article ID Journal Published Year Pages File Type
405088 Knowledge-Based Systems 2014 11 Pages PDF
Abstract

Microaggregation is a successful mechanism to solve the tension between respondent privacy and data quality in the context of Statistical Disclosure Control. Microaggregation, for numerical datasets, is defined as a clustering problem with the constraint of having at least k records in each group, such that the sum of the within-group squared error (SSE) is minimized. Unfortunately, the data publisher has to execute an algorithm iteratively for different values of k to investigate a good trade-off between privacy and utility. Multiple execution of an algorithm on large numerical datasets is resource wasting, since most of the computations are repetitive. In this paper, we propose a Fast Data-oriented Microaggregation algorithm (FDM) that efficiently anonymizes large multivariate numerical datasets for multiple successive values of k. Experimental results on real world datasets demonstrate the superiority of the method in terms of both the data quality and time complexity. Moreover, the method usually achieves a better trade-off between disclosure risk and information loss of the protected dataset in comparison with previous techniques.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,