Learning maximum excluding ellipsoids from imbalanced data with theoretical guarantees

Article ID	Journal	Published Year	Pages	File Type
11002877	Pattern Recognition Letters	2018	10 Pages	PDF

Abstract

In this paper, we address the problem of learning from imbalanced data. We consider the scenario where the number of negative examples is much larger than the number of positive ones. We propose a theoretically-founded method which learns a set of local ellipsoids centered at the minority class examples while excluding the negative examples of the majority class. We address this task from a Mahalanobis-like metric learning point of view and we derive generalization guarantees on the learned metric using the uniform stability framework. Our experimental evaluation on classic benchmarks and on a proprietary dataset in bank fraud detection shows the effectiveness of our approach, particularly when the imbalancy is huge.

Keywords

Support vector data description Uniform stability Imbalanced data Classification Statistical machine learning Metric learning