Article ID Journal Published Year Pages File Type
6856808 Information Sciences 2018 40 Pages PDF
Abstract
In recent years, imbalanced dataset classification has received significant attention due to its application in real-world problems, resulting in emergence of a new class of algorithms. Classification algorithms that work with discretized data have been shown to yield better performance. Thus, discretization is often a critical technique in data preprocessing. In this paper, a novel Evolutionary Multi-objective Discretization algorithm for binary Imbalanced Datasets (EMDID) is presented. The proposed algorithm takes advantage of evolutionary multi-objective optimization to simultaneously optimize three objective functions: (1) Area under the ROC curve (AUC); (2) the number of cut points; and (3) low-frequency cut points. The first objective function uses AUC, instead of classification accuracy, to choose better cut points so as to identify the minority class. The second objective function reduces the number of cut points while in the third objective function, low-frequency cut points are selected so that information loss caused by (continuous to discrete) data discretization is minimized. To evaluate the proposed algorithm, 25 imbalanced benchmark datasets are totally used and the results are compared to those of popular algorithms in the literature such as Class-Attribute Interdependence Maximization (CAIM) and Evolutionary Multi-objective Discretization (EMD). Our findings indicate that the proposed algorithm outperforms the other techniques in terms of the number of cut points, AUC, and non-parametric statistical tests.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,