Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
410340 | Neurocomputing | 2013 | 11 Pages |
Annotating class labels of a large number of time-series data is generally an expensive task. We propose novel semi-supervised learning algorithms that can improve the classification accuracy significantly by exploiting a relatively larger amount of unlabeled data in conjunction with a few labeled samples. Our algorithms utilize the unlabeled data as regularizers for opting for classifiers with stronger certainty on the unlabeled data. For the state-of-the-art conditional probabilistic sequence model called the hidden conditional random field, we first suggest the entropy minimization algorithm that was previously applied for static classification setups. More sophisticated margin-based approaches are then introduced, motivated by the semi-supervised support vector machines originally aimed for non-sequential data. We provide effective ways to incorporate and minimize the hat loss function for sequence data via probabilistic treatment in a principled manner. We show the performance improvement achieved by our methods on several semi-supervised time-series data classification scenarios.