Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers

Article ID	Journal	Published Year	Pages	File Type
534559	Pattern Recognition Letters	2014	10 Pages	PDF

Abstract

•New upper-bounds for generalization error estimation of classifiers are proposed.•They can be used for effective in-sample model selection purposes.•The bounds base on the Rademacher complexity and are useful when unlabeled patterns are available.•Confidence term in the Rademacher bound can be reduced by a factor of three.•Localized versions of the hypothesis class are derived, allowing to tighten the error estimation.

We derive in this work new upper bounds for estimating the generalization error of kernel classifiers, that is the misclassification rate that the models will perform on new and previously unseen data. Though this paper is more targeted towards the error estimation topic, the generalization error can be obviously exploited, in practice, for model selection purposes as well. The derived bounds are based on Rademacher complexity and result to be particularly useful when a set of unlabeled samples are available, in addition to the (labeled) training examples: we will show that, by exploiting further unlabeled patterns, the confidence term of the conventional Rademacher complexity bound can be reduced by a factor of three. Moreover, the availability of unlabeled examples allows also to obtain further improvements by building localized versions of the hypothesis class containing the optimal classifier.

Keywords

Model selection Error estimation Structural risk minimization Support vector machine Rademacher complexity