Article ID Journal Published Year Pages File Type
6938641 Pattern Recognition 2018 15 Pages PDF
Abstract
Multi-label feature selection has grabbed intensive attention in many big data applications. However, traditional multi-label feature selection methods generally ignore a real-world scenario, i.e., the features constantly flow into the model one by one over time. To address this problem, we develop a novel online multi-label streaming feature selection method based on neighborhood rough set to select a feature subset which contains strongly relevant and non-redundant features. The main motivation is that data mining based on neighborhood rough set does not require any priori knowledge of the feature space structure. Moreover, neighborhood rough set deals with mixed data without breaking the neighborhood and order structure of data. In this paper, we first introduce the maximum-nearest-neighbor of instance to granulate all instances which can solve the problem of granularity selection in neighborhood rough set, and then generalize neighborhood rough set in single-label to fit multi-label learning. Meanwhile, an online multi-label streaming feature selection framework, which includes online importance selection and online redundancy update, is presented. Under this framework, we propose a criterion to select the important features relative to the currently selected features, and design a bound on pairwise correlations between features under label set to filter out redundant features. An empirical study using a series of benchmark datasets demonstrates that the proposed method outperforms other state-of-the-art multi-label feature selection methods.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,