Article ID Journal Published Year Pages File Type
6874671 Journal of Computer and System Sciences 2018 20 Pages PDF
Abstract
This paper presents a novel approach for adapting attackers and defenders preferred patrolling strategies using reinforcement learning (RL) based-on average rewards in Stackelberg security games. We propose a framework that combines three different paradigms: prior knowledge, imitation and temporal-difference method. The overall RL architecture involves two highest components: the Adaptive Primary Learning architecture and the Actor-critic architecture. In this work we consider that defenders and attackers conforms coalitions in the Stackelberg security game, these are reached by computing the Strong Lp-Stackelberg/Nash equilibrium. We present a numerical example that validates the proposed RL approach measuring the benefits for security resource allocation.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,