On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms

Article ID	Journal	Published Year	Pages	File Type
6941381	Pattern Recognition Letters	2014	12 Pages	PDF

Abstract

The aim of this paper is to present an incremental selection strategy by which the classification accuracy of semi-supervised learning (SSL) algorithms can be improved. In SSL, both a limited number of labeled and a multitude of unlabeled data are utilized to learn a classification model. However, it is also well known that the utilization of the unlabeled data is not always helpful for SSL algorithms. To efficiently use them in learning the classification model, some of the unlabeled data that are deemed useful for the learning process are selected and given the correctly estimated labels. To address this problem, especially when dealing with semi-supervised MarginBoost (SSMB) algorithm (d'Alché-Buc et al., 2002), in this paper, two selection strategies, named simply recycled selection and incrementally reinforced selection, are considered and empirically compared. Our experimental results, obtained with well-known benchmark data sets, including SSL-type benchmarks and some UCI data sets, demonstrate that the latter, i.e., selecting only a small portion of strong examples from the available unlabeled data in an incremental fashion, can compensate for the shortcomings of the existing SSMB algorithm. Moreover, compared to the former, it generally achieves better classification accuracy results.

Keywords

Semi-supervised learning