Article ID Journal Published Year Pages File Type
530958 Pattern Recognition 2013 11 Pages PDF
Abstract

•We proposed a novel algorithm for semi-supervised learning.•The algorithm is based on the multiple clusters per class assumption.•It combines the efficient kNN method with a maximal margin classifier.•It is efficient and leads to competitive results compared to state-of-the-art algorithms.

Semi-supervised learning (SSL) involves the training of a decision rule from both labeled and unlabeled data. In this paper, we propose a novel SSL algorithm based on the multiple clusters per class assumption. The proposed algorithm consists of two stages. In the first stage, we aim to capture the local cluster structure of the training data by using the k-nearest-neighbor (kNN) algorithm to split the data into a number of disjoint subsets. In the second stage, a maximal margin classifier based on the second order cone programming (SOCP) is introduced to learn an inductive decision function from the obtained subsets globally. For linear classification problems, once the kNN algorithm has been performed, the proposed algorithm trains a classifier using only the first and second order moments of the subsets without considering individual data points. Since the number of subsets is usually much smaller than the number of training points, the proposed algorithm is efficient for handling big data sets with a large amount of unlabeled data. Despite its simplicity, the classification performance of the proposed algorithm is guaranteed by the maximal margin classifier. We demonstrate the efficiency and effectiveness of the proposed algorithm on both synthetic and real-world data sets.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , ,