Article ID Journal Published Year Pages File Type
455829 Computers & Security 2015 18 Pages PDF
Abstract

In this paper, we study the problem of designing a differentially private algorithm for mining maximal frequent sequences, which can not only achieve high data utility and a high degree of privacy, but also provide high time efficiency. To solve this problem, we present a new differentially private algorithm, which is referred to as DP-MFSM. DP-MFSM consists of three phases: pre-processing phase, expected frequent sequence mining (ESM) phase, and candidate extraction and verification (CEV) phase. Specifically, in the pre-processing phase, we first extract some statistical information from the input database, and use the extracted information to determine the values of some variables which will be used in the ESM phase. Then, in the ESM phase, we randomly partition the input database into several sub-databases, and use a partition-based ESM technique to find expected frequent sequences, which are a subset of candidate frequent sequences and more likely to be frequent. At last, in the CEV phase, we extract candidate maximal frequent sequences from the discovered expected frequent sequences, and use a splitting-based technique to verify which candidates are actually frequent in the input database. Through privacy analysis, we show that our DP-MFSM algorithm is ε-differentially private. Extensive experiments on real-world datasets illustrate that our DP-MFSM algorithm can substantially outperform alternative approaches.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , , ,