Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
11012480 | Neurocomputing | 2018 | 28 Pages |
Abstract
Dyna-learning and prioritized sweeping (PS in short) are the most commonly used reinforcement learning algorithms which use the model of the environment. In this paper, the modified versions of these algorithms are presented. The modification exploits the breadth-first search (BFS) to conduct additional modifications of the policy in the epoch mode. The experiments, which are performed in the dynamic grid world and in the ball-beam system, showed that the proposed modifications improved the efficiency of the reinforcement learning algorithms.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Roman Zajdel,