Preference-based anonymization of numerical datasets by multi-objective microaggregation

Article ID	Journal	Published Year	Pages	File Type
528429	Information Fusion	2015	20 Pages	PDF

Abstract

•Our method enables the data publisher to effectively control how DR and IL are aggregated for a fixed value of κ.•The privacy and utility of conventional microaggregation methods can be significantly improved, at the same time.•Our partition encoding is sound and complete, and efficient for large datasets.•Our method is general and can adapt itself to different requirements and assessment measures.

Microaggregation is a statistical disclosure control mechanism to realize k-anonymity as a basic privacy model. The method first partitions the dataset into groups of at least k records and then aggregates the group members. Generally, larger values of k provide lower Disclosure Risk (DR) at the expense of increasing Information Loss (IL). Therefore, the data publisher has to set appropriate microaggregation parameters to produce a protected and useful anonymized data. Unfortunately, in the most of the conventional microaggregation methods, the only available parameter of the algorithm, i.e., k does not enable the data publisher to effectively control the trade-off problem between DR and IL. This paper proposes a novel microaggregation method to optimize information loss and disclosure risk, simultaneously. The trade-off problem is expressed and solved within a multi-objective optimization framework. The data publisher can choose a more preferred protected dataset from a set of non-dominated candidate solutions, or even direct the method toward a desired point. Experimental results show that for a fixed value of k, the proposed method can usually produce more protected and useful datasets in comparison with the conventional methods.

Keywords

Microaggregation k-anonymity Multi-objective optimization Privacy