Understanding adversarial training: Increasing local stability of supervised models through robust optimization

Article ID	Journal	Published Year	Pages	File Type
6863754	Neurocomputing	2018	10 Pages	PDF

Abstract

We show that adversarial training of supervised learning models is in fact a robust optimization procedure. To do this, we establish a general framework for increasing local stability of supervised learning models using robust optimization. The framework is general and broadly applicable to differentiable non-parametric models, e.g., Artificial Neural Networks (ANNs). Using an alternating minimization-maximization procedure, the loss of the model is minimized with respect to perturbed examples that are generated at each parameter update, rather than with respect to the original training data. Our proposed framework generalizes adversarial training, as well as previous approaches for increasing local stability of ANNs. Experimental results reveal that our approach increases the robustness of the network to existing adversarial examples, while making it harder to generate new ones. Furthermore, our algorithm improves the accuracy of the networks also on the original test data.

Keywords

Robust optimization Deep learning