Handling adversarial concept drift in streaming data

Article ID	Journal	Published Year	Pages	File Type
6855178	Expert Systems with Applications	2018	23 Pages	PDF

Abstract

Classifiers operating in a dynamic, real world environment, are vulnerable to adversarial activity, which causes the data distribution to change over time. These changes are traditionally referred to as concept drift, and several approaches have been developed in literature to deal with the problem of drift detection and handling. However, most concept drift handling techniques approach it as a domain independent task, to make them applicable to a wide gamut of reactive systems. These techniques are developed from an adversarial agnostic perspective, where they naively assume that adversarial activity is like any other change to the data, which can be fixed by retraining the models. However, this is not the case when a malicious agent is trying to evade the deployed classification system. In such an environment, the properties of concept drift are unique, as the drift is intended to degrade the system and at the same time designed to avoid detection by traditional concept drift detection techniques. This special category of drift is termed as adversarial drift, and this paper analyzes its characteristics and impact in a streaming environment. A novel framework for dealing with adversarial concept drift is proposed, called the Predict-Detect streaming framework. This framework uses adversarial forethought and incorporates the context of classification into the drift detection task, to provide leverage in dynamic-adversarial domains. Experimental evaluation of the framework, on generated adversarial drifting data streams, demonstrates that this framework is able to provide early and reliable unsupervised indication of drift, and is able to recover from drifts swiftly. While traditional drift detectors can be evaded by intelligent adversaries, the proposed framework is especially designed to capture adversaries by misdirecting them into revealing themselves. In addition, the framework is designed to work on imbalanced and sparsely labeled data streams, as a limited-memory, incremental algorithm. The generic design and domain independent nature of the framework makes it applicable as a blueprint for developers wanting to implement reactive security to their classification based systems.

Keywords

Adversarial machine learning Streaming data Classification Concept drift Active learning