Dual autoencoders features for imbalance classification problem

Article ID	Journal	Published Year	Pages	File Type
531796	Pattern Recognition	2016	15 Pages	PDF

Abstract

•To our best knowledge, this paper proposes the first feature learning-based method for dealing with imbalance pattern classification using stacked autoencoders. We provide a new viewpoint for solving imbalanced pattern classification problems by projecting the input space onto a learned feature space with better representation using stacked autoencoders.•The concatenation of learned features from two different stacked autoencoders further enhances the classification accuracy by providing a better representation with different characteristics of the data to the classifier to learn. More specifically, the sigmoid activation function is less sensitive to input changes and provides robust representation while the tanh activation function is more sensitive and provides detailed information of the data.•Experimental results show that the DAF yields statistical significant improvement in the AUC, the F1-score and the G-Mean in comparison to representative methods of resampling-based and feature projection methods for dealing with the imbalanced pattern classification problems.

Many classification problems encountered in real-world applications exhibit a profile of imbalanced data. Current methods depend on data resampling. In fact, if the feature set provides a clear decision boundary, resampling may not be needed to solve the imbalanced classification problem. Therefore, this work proposes a feature learning method based on the autoencoder to learn a set of features with better classification capabilities of the minority and the majority classes to address the imbalanced classification problems. Two sets of features are learned by two stacked autoencoders with different activation functions to capture different characteristics of the data and they are combined to form the Dual Autoencoding Features. Samples are then classified in the new feature space learned in this manner instead of the original input space. Experimental results show that the proposed method outperforms current resampling-based methods with statistical significance for imbalanced pattern classification problems.

Keywords

Stacked autoencoder Imbalanced Classification Feature learning