Kd-trees and the real disclosure risks of large statistical databases

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
528302	869553	2012	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Kd-trees Record Linkage - پیوند ضبط Statistical disclosure control - کنترل افشای اطلاعات

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

Kd-trees and the real disclosure risks of large statistical databases

چکیده انگلیسی

Estimating the disclosure risk of a Statistical Disclosure Control (SDC) protection method by means of (distance-based) record linkage techniques is a very popular approach to analyze the privacy level offered by such a method. When databases are very large, some particular record linkage techniques such as blocking or partitioning are usually applied to make this process reasonably efficient. However, in this case the record linkage process is not exact, which means that the disclosure risk of a SDC protection method may be underestimated.In this paper we propose the use of kd-trees techniques to apply exact yet very efficient record linkage when (protected) datasets are very large. We describe some experiments showing that this approach achieves better results, in terms of both accuracy and running time, than more classical approaches such as record linkage based on a sliding window.We also discuss and experiment on the use of these techniques not to link a whole protected record with its original one, but just to guess the value of some confidential attribute(s) of the record(s). This fact leads to concepts such as k-neighbor l-diversity or k-neighbor p-sensitivity, a generalization (to any SDC protection method) of l-diversity or p-sensitivity, which have been defined for SDC protection methods ensuring k-anonymity, such as microaggregation.

► Use of kd-trees techniques to apply exact yet very efficient record linkage.
► Experiments showing the benefits of the new approach over previous approaches.
► We define and study the attribute disclosure risk of a confidential attribute.
► We introduce the concepts of k-neighbors p-sensitivity and k-neighbors l-diversity.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Fusion - Volume 13, Issue 4, October 2012, Pages 260–273

نویسندگان

Javier Herranz, Jordi Nin, Marc Solé,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Kd-trees and the real disclosure risks of large statistical databases

دسترسی سریع

ارتباط

English Website