Using the doubling dimension to analyze the generalization of learning algorithms

Article ID	Journal	Published Year	Pages	File Type
430170	Journal of Computer and System Sciences	2009	13 Pages	PDF

Abstract

Given a set F of classifiers and a probability distribution over their domain, one can define a metric by taking the distance between a pair of classifiers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor.We then prove a bound that holds for any algorithm that outputs a classifier with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of F and strengthens the best known bound in terms of the VC-dimension alone.Finally, we show that there is no bound on the doubling dimension of halfspaces in Rn in terms of n that holds independently of the domain distribution. This implies that there is no such a bound in terms of the VC-dimension of F (in contrast with the metric dimension).