Content sensitivity based access control framework for Hadoop

Article ID	Journal	Published Year	Pages	File Type
7111755	Digital Communications and Networks	2017	22 Pages	PDF

Abstract

Big data technologies have seen tremendous growth in recent years. They are widely used in both industry and academia. In spite of such exponential growth, these technologies lack adequate measures to protect data from misuse/abuse. Corporations that collect data from multiple sources are at risk of liabilities due to the exposure of sensitive information. In the current implementation of Hadoop, only file-level access control is feasible. Providing users with the ability to access data based on the attributes in a dataset or the user's role is complicated because of the sheer volume and multiple formats (structured, unstructured and semi-structured) of data. In this paper, we propose an access control framework, which enforces access control policies dynamically based on the sensitivity of the data. This framework enforces access control policies by harnessing the data context, usage patterns and information sensitivity. Information sensitivity changes over time with the addition and removal of datasets, which can lead to modifications in access control decisions. The proposed framework accommodates these changes. The proposed framework is automated to a large extent as the data itself determines the sensitivity with minimal user intervention. Our experimental results show that the proposed framework is capable of enforcing access control policies on non-multimedia datasets with minimal overhead.

Keywords

Information Value Privacy access control