Article ID Journal Published Year Pages File Type
534559 Pattern Recognition Letters 2014 10 Pages PDF
Abstract

•New upper-bounds for generalization error estimation of classifiers are proposed.•They can be used for effective in-sample model selection purposes.•The bounds base on the Rademacher complexity and are useful when unlabeled patterns are available.•Confidence term in the Rademacher bound can be reduced by a factor of three.•Localized versions of the hypothesis class are derived, allowing to tighten the error estimation.

We derive in this work new upper bounds for estimating the generalization error of kernel classifiers, that is the misclassification rate that the models will perform on new and previously unseen data. Though this paper is more targeted towards the error estimation topic, the generalization error can be obviously exploited, in practice, for model selection purposes as well. The derived bounds are based on Rademacher complexity and result to be particularly useful when a set of unlabeled samples are available, in addition to the (labeled) training examples: we will show that, by exploiting further unlabeled patterns, the confidence term of the conventional Rademacher complexity bound can be reduced by a factor of three. Moreover, the availability of unlabeled examples allows also to obtain further improvements by building localized versions of the hypothesis class containing the optimal classifier.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , ,