Distance based resampling of imbalanced classes: With an application example of speech quality assessment

Article ID	Journal	Published Year	Pages	File Type
4942676	Engineering Applications of Artificial Intelligence	2017	22 Pages	PDF

Abstract

This paper presents a new general methodological approach to imbalanced learning as one of the challenging problems in pattern classification. The presented method is founded on maximization of the sample entropy. The method involves detection of distributive properties of ideally balanced regular lattice sample and acceptable transfer of these properties to an arbitrary imbalanced sample increasing its representativeness. The proposed procedure assumes undersampling applied on areas of high probability density in the sample space combined with oversampling in the areas of low density. The main achievement of this method is the increased sample class entropy which reduces the inductive learner's tendency to favor prominent class, or cluster. In addition to class balancing, this method can be useful for function approximation, clustering, and sample dimension reduction. The high degree of generality of the method implies its applicability on data of various complexity and imbalance. The presented theoretical foundation of the method was verified on a set of proper synthetic samples. The method's practical usability is confirmed by a comparative classification of a large set of databases including speech signal samples.

Keywords

Binomial distribution Neural network Nearest neighbors Imbalanced learning