Article ID Journal Published Year Pages File Type
406204 Neurocomputing 2015 9 Pages PDF
Abstract

In semi-supervised classification, data are partially labeled and the task is to label the remaining data. Compared with unsupervised learning, it is expected that the labeling accuracy would be improved due to the information of the given labels. However, since the class labels are manually assigned by experts and data are sometimes difficult to collect, the assigned labels are noisy. Then, the balance of classes in the labeled data can be different from that in the unlabeled data. In order to solve this problem, a number of practical methods for modifying the class balance, such as instance re-weighting or re-sampling, have been proposed. Despite the increase in application studies, the effect of the noisy labels on the accuracy has not yet been thoroughly investigated. In the present paper, we theoretically analyze the accuracy of the semi-supervised classification. In comparison with the case of balanced classes, we observe the loss of accuracy caused by label noise.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
,