Immediate return preference emerged from a synaptic learning rule for return maximization

Article ID	Journal	Published Year	Pages	File Type
404166	Neural Networks	2015	8 Pages	PDF

Abstract

Animals including human often prefer immediate returns to larger delayed returns. It holds true in the human communications. Standard interpretation of the immediate return preference is that an animal might subjectively discount the value of a delayed reward, and that might choose the larger valued one. The interpretation has been successfully applied to explain behavior of many species including human. However, the description is not necessarily sufficient to apply for interactions of individuals. This study adopts a different approach to seek a possibility that immediate return preference may be reproduced by learning rule to maximize objective outcomes. We show that a synaptic learning rule to achieve the temporal difference (TD) learning for outcome maximization fails the maximization and exhibits immediate return preference if the context is not properly represented as a internal state.

Keywords

Inter-temporal choice Synaptic plasticity Reinforcement learning