Article ID Journal Published Year Pages File Type
412297 Neurocomputing 2014 9 Pages PDF
Abstract

Naturally occurring data have been growing in a huge volume size, which poses a big challenge to give them high-quality labels to learn a good model. Therefore, it is critical to only select the most informative data points for labeling, which is cast into the framework of active learning. We study this problem in a regression model from optimal experimental design (OED). To this end, several OED based methods have been developed, but the relations between the data points and their predictions are still not fully explored. Inspired by this, we employ the Hilbert–Schmidt independence criterion (HSIC) to maximize the dependence between the samples and their estimations in a global view. Thus, we present a novel active learning method named manifold optimal experimental design via dependence maximization (MODM). Specifically, those points having maximum dependence with their predictions are expected to be included for labeling. Besides, it utilizes the graph Laplacian to preserve the locally geometrical structure of the data. In this way, the most informative data points can be better selected. Moreover, we adopt a sequential strategy to optimize the objective function. The effectiveness of the proposed algorithm has been experimentally verified in content-based image retrieval.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,