کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
515469 | 867023 | 2015 | 15 صفحه PDF | دانلود رایگان |

• A new affinity-based k-anonymity model that leverages query refinement patterns.
• A novel notion of generalized graph k-cores.
• A link between k-anonymity and k-cores.
• A novel evaluation test based on sensitive and non-sensitive queries.
• A comparison between k-anonymity under affinity and under WordNet generalization.
Search log k-anonymization is based on the elimination of infrequent queries under exact matching conditions, usually at the cost of high data loss. We present a semantic approach to k -anonymity, termed kΘkΘ-affinity, in which a query can be protected by affine rather than identical queries. Based on the observation that many infrequent queries can be seen as refinements of a more general frequent query, we develop a three-step privacy model. We first represent query concepts as probabilistically weighted n-grams and extract them from the search log data. We then expand the original log queries with such concepts, defining the affinity between two queries as the similarity of their expanded representations. Finally, after building the graph of Θ-affine queries (for a given threshold Θ), we find the generalized k-cores of this graph, which coincide with the sets of queries satisfying kΘkΘ-affinity privacy. Experimenting with the AOL dataset, we compare k-anonymity under affinity to k -anonymity under equality and under WordNet generalization. We show that kΘkΘ-affinity achieves similar levels of privacy while at the same time reducing the data losses to a great extent. We also discuss its sensitivity to attacks.
Graphical AbstractFigure optionsDownload as PowerPoint slide
Journal: Information Processing & Management - Volume 51, Issue 2, March 2015, Pages 74–88