Article ID Journal Published Year Pages File Type
10326468 Neurocomputing 2016 11 Pages PDF
Abstract
In typical data mining applications, labeling the large amounts of data is difficult, expensive, and time consuming, if annotated manually. To avoid manual labeling, semi-supervised learning uses unlabeled data along with the labeled data in the training process. Transductive support vector machine (TSVM) is one such semi-supervised, which has been found effective in enhancing the classification performance. However there are some deficiencies in TSVM, such as presetting number of the positive class samples, frequently exchange of class label, and its requirement for larger amount of unlabeled data. To tackle these deficiencies, in this paper, we propose a new semi-supervised learning algorithm based on active learning combined with TSVM. The algorithm applies active learning to select the most informative instances based on the version space minimum-maximum division principle with human annotation for improve the classification performance. Simultaneously, in order to make full use of the distribution characteristics of unlabeled data, we added a manifold regularization term to the objective function. Experiments performed on several UCI datasets and a real world book review case study demonstrate that our proposed method achieves significant improvement over other benchmark methods yet consuming less amount of human effort, which is very important while labeling data manually.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,