Embedded local feature selection within mixture of experts

Article ID	Journal	Published Year	Pages	File Type
391706	Information Sciences	2014	12 Pages	PDF

Abstract

•We study the classification technique called mixture of experts.•Mixture of experts has serious difficulties with high dimensional datasets.•We derive an embedded feature selection within mixture of experts.•Our technique overcomes the mixture of experts in terms of accuracy.•Our technique recovers relevant according to experiments in artificial datasets.

A useful strategy to deal with complex classification scenarios is the “divide and conquer” approach. The mixture of experts (MoE) technique makes use of this strategy by jointly training a set of classifiers, or experts, that are specialized in different regions of the input space. A global model, or gate function, complements the experts by learning a function that weighs their relevance in different parts of the input space. Local feature selection appears as an attractive alternative to improve the specialization of experts and gate function, particularly, in the case of high dimensional data. In general, subsets of dimensions, or subspaces, are usually more appropriate to classify instances located in different regions of the input space. Accordingly, this work contributes with a regularized variant of MoE that incorporates an embedded process for local feature selection using L1L1 regularization. Experiments using artificial and real-world datasets provide evidence that the proposed method improves the classical MoE technique, in terms of accuracy and sparseness of the solution. Furthermore, our results indicate that the advantages of the proposed technique increase with the dimensionality of the data.

Keywords

Mixture of experts Regularization