Reinforcement learning for discounted values often loses the goal in the application to animal learning

Article ID	Journal	Published Year	Pages	File Type
6863487	Neural Networks	2012	4 Pages	PDF

Abstract

The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference.

Keywords

Inter-temporal choice Delay discounting impulsivity Reinforcement learning