Robust label compression for multi-label classification

Article ID	Journal	Published Year	Pages	File Type
6862173	Knowledge-Based Systems	2016	11 Pages	PDF

Abstract

Label compression (LC) is an effective strategy to reduce time cost and improve classification performance simultaneously for multi-label classification. One main limitation of existing LC methods is that they are prone to outliers. Here outliers include outliers in the feature space and outliers in the label space. Outliers in the feature space are obtained due to data acquisition devices. Outliers in the label space refer to label vectors that are inconsistent with the regular label correlations. In this paper, we propose a new LC method, termed robust label compression (RLC), based on l2,1-norm to deal with outliers in the feature space and label space. The objective function of RLC consists of two losses: the encoding loss to measure the compression error and the dependence loss to measure the relevance between the instances and the obtained code vectors after compressing the label vectors. To achieve robustness to outliers, we utilize the l2,1-norm on both losses. We propose an efficient optimization algorithm for it and present theoretical analysis. Experiments across six data sets validate the superiority of our proposed method to state-of-art LC methods for multi-label classification.

Keywords

Outliers Multi-label classification