Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining

Article ID	Journal	Published Year	Pages	File Type
553862	Decision Support Systems	2009	8 Pages	PDF

Abstract

Identity disclosure is one of the most serious privacy concerns in today's information age. A well-known method for protecting identity disclosure is k-anonymity. A dataset provides k-anonymity protection if the information for each individual in the dataset cannot be distinguished from at least k − 1 individuals whose information also appears in the dataset. There is a flaw in k-anonymity that would still allow an intruder to discern the confidential information of individuals in the anonymized data. To overcome this problem, we propose a data reconstruction approach to achieve k-anonymity protection in predictive data mining. In this approach, the potentially identifying attributes are first masked using aggregation (for numeric data) and swapping (for nominal data). A genetic algorithm technique is then applied to the masked data to find a good subset of it. This subset is then replicated to form the released dataset that satisfies the k-anonymity constraint.

Keywords

k-anonymity Genetic algorithm Privacy Data mining