Psychological models of human and optimal performance in bandit problems

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
378528	659163	2011	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Heuristic models Exploration versus exploitation Bandit problem - مشکل دزدگیر Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Psychological models of human and optimal performance in bandit problems

چکیده انگلیسی

In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a fixed but unknown rate of reward, to maximize their total number of rewards over a sequence of trials. Performing well in these problems requires balancing the need to search for highly-rewarding alternatives, with the need to capitalize on those alternatives already known to be reasonably good. Consistent with this motivation, we develop a new psychological model that relies on switching between latent exploration and exploitation states. We test the model over a range of two-alternative bandit problems, against both human and optimal decision-making data, comparing it to benchmark models from the reinforcement learning literature. By making inferences about the latent states from optimal decision-making behavior, we characterize how people should switch between exploration and exploitation. By making inferences from human data, we begin to characterize how people actually do switch. We discuss the implications of these findings for understanding and measuring the competing demands of exploration and exploitation in sequential decision-making.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Cognitive Systems Research - Volume 12, Issue 2, June 2011, Pages 164–174

نویسندگان

Michael D. Lee, Shunan Zhang, Miles Munro, Mark Steyvers,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Psychological models of human and optimal performance in bandit problems

دسترسی سریع

ارتباط

English Website