Supervised neighborhood graph construction for semi-supervised classification

Article ID	Journal	Published Year	Pages	File Type
532388	Pattern Recognition	2012	10 Pages	PDF

Abstract

Graph based methods are among the most active and applicable approaches studied in semi-supervised learning. The problem of neighborhood graph construction for these methods is addressed in this paper. Neighborhood graph construction plays a key role in the quality of the classification in graph based methods. Several unsupervised graph construction methods have been proposed that have addressed issues such as data noise, geometrical properties of the underlying manifold and graph hyper-parameters selection. In contrast, in order to adapt the graph construction to the given classification task, many of the recent graph construction methods take advantage of the data labels. However, these methods are not efficient since the hypothesis space of their possible neighborhood graphs is limited. In this paper, we first prove that the optimal neighborhood graph is a subgraph of a k′-NNk′-NN graph for a large enough k′k′, which is much smaller than the total number of data points. Therefore, we propose to use all the subgraphs of k′-NNsk′-NNs graph as the hypothesis space. In addition, we show that most of the previous supervised graph construction methods are implicitly optimizing the smoothness functional with respect to the neighborhood graph parameters. Finally, we provide an algorithm to optimize the smoothness functional with respect to the neighborhood graph in the proposed hypothesis space. Experimental results on various data sets show that the proposed graph construction algorithm mostly outperforms the popular k-NN based construction and other state-of-the-art methods.

► Graph construction is a key item of the graph based semi-supervised learning methods. ► Hypothesis spaces of graphs in previous supervised methods are quite restricted. ► We show that the optimal neighborhood graph is a subgraph of a k′-NNk′-NN graph. ► We show that previous methods are optimizing the smoothness functional, implicitly. ► The smoothness is optimized on the proposed hypothesis space efficiently.

Keywords

Manifold assumption Graph-based semi-supervised learning