Article ID Journal Published Year Pages File Type
407043 Neurocomputing 2014 8 Pages PDF
Abstract

Due to the rapid growth of the size of the digital information available, it is often impossible to label all the samples. Thus, it is crucial to select the most informative samples to label so that the learning performance can be most improved with limited labels. Many active learning algorithms have been proposed for this purpose. Most of these approaches effectively discover the Euclidean structure of the data space, whereas the geometrical (manifold) structure is not well respected. In this paper, we propose a novel active learning algorithm which explicitly considers the case that the data are sampled from a low dimensional sub-manifold embedded in the high dimensional ambient space. The geodesic distance of two data points on the manifold is estimated by the shortest-path distance between the two corresponding vertices in the nearest neighbor graph. By selecting the most representative points with respect to the manifold structure, our approach can effectively decrease the number of training examples the learner needs in order to achieve good performance. Experimental results on visual objects recognition and text categorization have demonstrated the effectiveness of our proposed approach.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,