Article ID Journal Published Year Pages File Type
4950389 Future Generation Computer Systems 2017 16 Pages PDF
Abstract
Data privacy is a stringent need when sharing and processing data on a distributed environment or in Internet of Things. Collaborative privacy-preserving data mining based on secured multiparty computation incur high communication and computational cost. Data anonymization is a promising technique in the field of privacy-preserving data mining used to protect the data against identity disclosure. Information loss and common attacks possible on the anonymized data are serious challenges of anonymization. Recently, data anonymization using data mining techniques has showed significant improvement in data utility. Still the existing techniques lack in effective handling of attacks. Hence in this paper, an anonymization algorithm based on clustering and resilient to similarity attack and probabilistic inference attack is proposed. The anonymized data is distributed on Hadoop Distributed File System. The method achieves a better trade-off between privacy and utility. In our work the data utility is measured in terms of accuracy and FMeasure with respect to different classifiers. Experiments show that the accuracy, FMeasure and the execution time of the classification algorithms on the privacy-preserved data sets formed by the proposed clustering algorithms are better than the existing algorithms.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,