Using ensembles for problems with characterizable changes in data distribution: A case study on quantification

Article ID	Journal	Published Year	Pages	File Type
528330	Information Fusion	2017	14 Pages	PDF

Abstract

•Our hypothesis: ensembles are well suited for problems with distribution changes.•If those changes are characterizable, ensembles can be designed to tackle them.•Idea: to generate different samples based on the expected distribution changes.•Case study: we present ensembles versions of two binary quantification algorithms.•Ensembles outperform original counterpart algorithms using trivial aggregation rules.

Ensemble methods are widely applied to supervised learning tasks. Based on a simple strategy they often achieve good performance, especially when the single models comprising the ensemble are diverse. Diversity can be introduced into the ensemble by creating different training samples for each model. In that case, each model is trained with a data distribution that may be different from the original training set distribution. Following that idea, this paper analyzes the hypothesis that ensembles can be especially appropriate in problems that: (i) suffer from distribution changes, (ii) it is possible to characterize those changes beforehand. The idea consists in generating different training samples based on the expected distribution changes, and to train one model with each of them. As a case study, we shall focus on binary quantification problems, introducing ensembles versions for two well-known quantification algorithms. Experimental results show that these ensemble adaptations outperform the original counterpart algorithms, even when trivial aggregation rules are used.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords

Quantification Ensembles