Article ID Journal Published Year Pages File Type
404166 Neural Networks 2015 8 Pages PDF
Abstract

Animals including human often prefer immediate returns to larger delayed returns. It holds true in the human communications. Standard interpretation of the immediate return preference is that an animal might subjectively discount the value of a delayed reward, and that might choose the larger valued one. The interpretation has been successfully applied to explain behavior of many species including human. However, the description is not necessarily sufficient to apply for interactions of individuals. This study adopts a different approach to seek a possibility that immediate return preference may be reproduced by learning rule to maximize objective outcomes. We show that a synaptic learning rule to achieve the temporal difference (TD) learning for outcome maximization fails the maximization and exhibits immediate return preference if the context is not properly represented as a internal state.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,