Article ID Journal Published Year Pages File Type
402506 Knowledge-Based Systems 2016 19 Pages PDF
Abstract

•We define a privacy model based on k-anonymity and one of its strong refinements to prevent the background knowledge attack.•We propose two hierarchical anonymization algorithm to satisfy our privacy model.•Our algorithms outperform the state-of the art anonymization algorithm in terms of utility and privacy.•We extend an information loss measure to capture data inaccuracies caused by not-fitted records in any equivalence class.

Preserving privacy in the presence of adversary’s background knowledge is very important in data publishing. The k-anonymity model, while protecting identity, does not protect against attribute disclosure. One of strong refinements of k-anonymity, β-likeness, does not protect against identity disclosure. Neither model protects against attacks featured by background knowledge. This research proposes two approaches for generating k-anonymous β-likeness datasets that protect against identity and attribute disclosures and prevent attacks featured by any data correlations between QIs and sensitive attribute values as the adversary’s background knowledge. In particular, two hierarchical anonymization algorithms are proposed. Both algorithms apply agglomerative clustering techniques in their first stage in order to generate clusters of records whose probability distributions extracted by background knowledge are similar. In the next phase, k-anonymity and β-likeness are enforced in order to prevent identity and attribute disclosures. Our extensive experiments demonstrate that the proposed algorithms outperform other state-of-the-art anonymization algorithms in terms of privacy and data utility where the number of unpublished records in our algorithms is less than that of the others. As well-known information loss metrics fail to measure precisely the imposed data inaccuracies stemmed from the removal of records that cannot be published in any equivalence class. This research also introduces an extension into the Global Certainty Penalty metric that considers unpublished records.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,