Article ID Journal Published Year Pages File Type
528025 Information Fusion 2016 11 Pages PDF
Abstract

•Few instance selection (IS) methods exist for regression.•Two different families of instance selection methods for regression are compared.•One is based in a simple discretization of the output variable, but with good results.•Both approaches can be used to adapt to regression IS methods for classification.•The fusion of these IS algorithms in an ensemble for regression is also analyzed.

Data pre-processing is a very important aspect of data mining. In this paper we discuss instance selection used for prediction algorithms, which is one of the pre-processing approaches. The purpose of instance selection is to improve the data quality by data size reduction and noise elimination. Until recently, instance selection has been applied mainly to classification problems. Very few recent papers address instance selection for regression tasks. This paper proposes fusion of instance selection algorithms for regression tasks to improve the selection performance. As the members of the ensemble two different families of instance selection methods are evaluated: one based on distance threshold and the other one on converting the regression task into a multiple class classification task. Extensive experimental evaluation performed on the two regression versions of the Edited Nearest Neighbor (ENN) and Condensed Nearest Neighbor (CNN) methods showed that the best performance measured by the error value and data size reduction are in most cases obtained for the ensemble methods.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , ,