کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
489856 | 704634 | 2015 | 7 صفحه PDF | دانلود رایگان |

Outliers has been studied in a variety of domains including Big Data, High dimensional data, Uncertain data, Time Series data, Biological data, etc. In majority of the sample datasets available in the repository, atleast 10% of the data may be erroneous, missing or not available. In this paper, we utilize the concept of data preprocessing for outlier reduction. We propose two algorithms namely Distance-Based outlier detection and Cluster-Based outlier algorithm for detecting and removing outliers using a outlier score. By cleaning the dataset and clustering based on similarity, we can remove outliers on the key attribute subset rather than on the full dimensional attributes of dataset. Experiments were conducted using 3 built-in Health care dataset available in R package and the results show that the cluster-based outlier detection algorithm providing better accuracy than distance based outlier detection algorithm.
Journal: Procedia Computer Science - Volume 50, 2015, Pages 209-215