Article ID Journal Published Year Pages File Type
6936166 Transportation Research Part C: Emerging Technologies 2018 15 Pages PDF
Abstract
In the route choice problem, self-interested drivers aim at choosing routes that minimise travel costs between their origins and destinations. We model this problem as a multiagent reinforcement learning scenario. Here, since agents must adapt to each others' decisions, the minimisation goal is seen as a moving target. Regret is a well-known performance measure in such settings, and considers how much worse an agent performs compared to the best fixed action in hindsight. In general, regret cannot be computed (and used) by agents because its calculation requires observing the costs of all available routes (including non-taken ones). In contrast to previous works, here we show how agents can compute regret by building upon their experience and via information provided by a mobile (Waze-like) navigation app. Specifically, we compute the regret of each action as a linear combination of local (experience-based) and global (app-based) information. We refer to such a measure as the action regret, which can be used by the agents as reinforcement signal. Under these conditions, agents are able to minimise their external regret even when the cost of routes is not known in advance. Based on experimental evaluation in several abstract road networks, we show that the system converges to approximate User Equilibria.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,