Article ID Journal Published Year Pages File Type
403670 Knowledge-Based Systems 2013 11 Pages PDF
Abstract

One key issue for most classification algorithms is that they need large amounts of labeled samples to train the classifier. Since manual labeling is time consuming, researchers have proposed technologies of active learning and semi-supervised learning to reduce manual labeling workload. There is a certain degree of complementarity between active learning and semi-supervised learning, and therefore some researches combine them to further reduce manual labeling workload. However, researches on combining active learning and semi-supervised learning for SVM classifier are rare. Of numerous SVM active learning algorithms, the most popular is the one that queries the sample closest to the current classification hyperplane in each iteration, which is denoted as SVMAL in this paper. Realizing that SVMAL is only interested in samples that are more likely to be on the class boundary, while ignoring the usage of the rest large amounts of unlabeled samples, this paper designs a semi-supervised learning algorithm to make full use of the rest non-queried samples, and further forms a new active semi-supervised SVM algorithm. The proposed active semi-supervised SVM algorithm uses active learning to select class boundary samples, and semi-supervised learning to select class central samples, for class central samples are believed to better describe the class distribution, and to help SVMAL finding the boundary samples more precisely. In order not to introduce too many labeling errors when exploring class central samples, the label changing rate is used to ensure the reliability of the predicted labels. Experimental results show that the proposed active semi-supervised SVM algorithm performs much better than the pure SVM active learning algorithm, and thus can further reduce manual labeling workload.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,