Privacy protection of textual attributes through a semantic-based masking method

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
528306	869553	2012	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Privacy Protection - حفاظت از حریم شخصی Semantic similarity - شباهت معنایی Anonymity - ناشناس Ontologies - هستی شناسی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

Privacy protection of textual attributes through a semantic-based masking method

چکیده انگلیسی

Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data.

► A global masking method for textual data based on value substitutions is presented.
► The semantics modelled in ontologies is exploited to propose substitutions.
► Several heuristics are proposed in order to ensure the scalability of the approach.
► Preserving the semantics improves the utility of the masked data.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Fusion - Volume 13, Issue 4, October 2012, Pages 304–314

نویسندگان

Sergio Martínez, David Sánchez, Aida Valls, Montserrat Batet,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Privacy protection of textual attributes through a semantic-based masking method

دسترسی سریع

ارتباط

English Website