A tree-based regressor that adapts to intrinsic dimension

Article ID	Journal	Published Year	Pages	File Type
430289	Journal of Computer and System Sciences	2012	20 Pages	PDF

Abstract

We consider the problem of nonparametric regression , consisting of learning an arbitrary mapping f:X→Yf:X→Y from a data set of (x,y)(x,y) pairs in which the y values are corrupted by noise of mean zero. This statistical task is known to be subject to a severe curse of dimensionality: if X⊂RDX⊂RD, and if the only smoothness assumption on f is that it satisfies a Lipschitz condition, it is known that any estimator based on n data points will have an error rate (risk) of Ω(n−2/(2+D))Ω(n−2/(2+D)). Here we present a tree-based regressor whose risk depends only on the doubling dimension of XX, not on D . This notion of dimension generalizes two cases of contemporary interest: when XX is a low-dimensional manifold, and when XX is sparse. The tree is built using random hyperplanes as splitting criteria, building upon recent work of Dasgupta and Freund (2008) [5]; and we show that axis-parallel splits cannot achieve the same finite-sample rate of convergence.

► Analysis of high-dimensional regression using random partition of data (RPtree). ► RPtree regression requires no dimension reduction: adapts to intrinsic dimension. ► General notion of intrinsic dimension captures manifold dimension, sparsity. ► Model selection made cheaper by monitoring data diameters in cells. ► Contrasting analysis: adaptivity of common dyadic tree method.

Keywords

Sparse data Nonparametric regression Manifold