Systematic Literature Review on the Anonymization of High Dimensional Streaming Datasets for Health Data Sharing

Article ID	Journal	Published Year	Pages	File Type
484586	Procedia Computer Science	2015	8 Pages	PDF

Abstract

One of the biggest challenges to health data sharingis regulations that prohibit the transmission and distribution of Personal Health Information (PHI) even among collaborating organizations. This impedes research and reduces the utility of these datasets. Anonymization can address this issue by hidingPHI while maintaining the analytical utility of the data. Much research has focused on data that is static, independent and complete. Unfortunately, this is not typical of health data. Instead of static, independent tables, health data is in relational databases with multiple high-dimensional tables that are transactional and constantly changing. Data recipients usually receive multiple versions of the database over time. This study reviews literature on anonymization methodologies for large and fast changing high-dimensional datasets, especially health data. Relevant papers are analyzed, categorized and compared in terms of scope, and contributions. Finally, we used the extracted details from our analysis to outline possible research direction for developing a realistic anonymization framework for health data sharing.