Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
487178 | Procedia Computer Science | 2015 | 6 Pages |
It is widely acknowledged that biological beings (animals) are not Markov: modelers generally do not model them as agents receiving a complete representation of their environment's state in input (except perhaps in simple controlled tasks). In this paper, we claim that biological beings generally cannot recognize rewarding Markov states of their environment either. Therefore, we model them as agents trying to perform rewarding interactions with their environment (interaction-driven tasks), but not as agents trying to reach rewarding states (state-driven tasks). We review two interaction-driven tasks: the AB and AABB task, and implement a non-Markov Reinforcement-Learning (RL) algorithm based upon historical sequences and Q-learning. Results show that this RL algorithm takes significantly longer than a constructivist algorithm implemented previously by Georgeon, Ritter, & Haynes (2009). This is because the constructivist algorithm directly learns and repeats hierarchical sequences of interactions, whereas the RL algorithm spends time learning Q-values. Along with theoretical arguments, these results support the constructivist paradigm for modeling biological agents.