Cluster Based Outlier Detection Algorithm for Healthcare Data

Article ID	Journal	Published Year	Pages	File Type
489856	Procedia Computer Science	2015	7 Pages	PDF

Abstract

Outliers has been studied in a variety of domains including Big Data, High dimensional data, Uncertain data, Time Series data, Biological data, etc. In majority of the sample datasets available in the repository, atleast 10% of the data may be erroneous, missing or not available. In this paper, we utilize the concept of data preprocessing for outlier reduction. We propose two algorithms namely Distance-Based outlier detection and Cluster-Based outlier algorithm for detecting and removing outliers using a outlier score. By cleaning the dataset and clustering based on similarity, we can remove outliers on the key attribute subset rather than on the full dimensional attributes of dataset. Experiments were conducted using 3 built-in Health care dataset available in R package and the results show that the cluster-based outlier detection algorithm providing better accuracy than distance based outlier detection algorithm.