Article ID Journal Published Year Pages File Type
10151499 Information Fusion 2019 30 Pages PDF
Abstract
Handling outliers are one of the primary concerns of today's data mining techniques. The concept of outliers, it's handling, and diagnosis is context specific and varies according to the field of application. The existence of outliers while mining web data is inevitable by virtue of unique characteristic features exhibited by a typical web user. As the output of a regression algorithm is always different from the actual value, it poses a challenge to the knowledge workers and researchers about the notion of an outlier in such cases. In this paper, we propose to develop the concept of an outlier with respect to regression analysis of any Web-based dataset. A framework to find outliers in the output of a regression algorithm is being formulated with the help of Ordered Weighted operators. The underlying idea is to find an error rectification value, ϵ, that will work, in association with the predicted value from the regression model and then help to distinguish an outlier. This will, in addition, also provide a possible range of deviation from the predicted output. A case study on a web dataset is being done to show the usefulness of the proposed approach.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,