Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6903115 | Swarm and Evolutionary Computation | 2018 | 18 Pages |
Abstract
Algorithms for constructing models of classification under streaming data scenarios are becoming increasingly important. In order for such algorithms to be applicable under 'real-world' contexts we adopt the following objectives: 1) operate under label budgets, 2) make label requests without recourse to true label information, and 3) robustness to class imbalance. Specifically, we assume that model building is only performed using the content of a Data Subset (as in active learning). Thus, the principle design decisions are with regard to the definitions employed for sampling and archiving policies. Moreover, these policies should operate without prior information regarding the distribution of classes, as this varies over the course of the stream. A team formulation for genetic programming (GP) is assumed as the generic model for classification in order to support incremental changes to classifier content. Benchmarking is conducted with thirteen real-world Botnet datasets with label budgets of the order of 0.5-5% and significant amounts of class imbalance. Specific recommendations are made for detecting the costly minor classes under these conditions. Comparison with current approaches to streaming data under label budgets supports the significance of these findings.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Sara Khanchi, Ali Vahdat, Malcolm I. Heywood, A. Nur Zincir-Heywood,