دانلود رایگان مقاله: یادگیری تقویت چند عامل به عنوان یک تمرین برای برنامه ریزی غیر متمرکز

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
411500	679568	2016	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Multi-agent reinforcement learning as a rehearsal for decentralized planning

ترجمه فارسی عنوان

یادگیری تقویت چند عامل به عنوان یک تمرین برای برنامه ریزی غیر متمرکز

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

یادگیری تقویت چند عامل، برنامه ریزی منسجم

Decentralized planning

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

یادگیری تقویت چند عامل به عنوان یک تمرین برای برنامه ریزی غیر متمرکز

چکیده انگلیسی

Decentralized partially observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Multi-agent reinforcement learning (MARL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. In some practical scenarios this may not be the case. We propose a novel MARL approach in which agents are allowed to rehearse with information that will not be available during policy execution. The key is for the agents to learn policies that do not explicitly rely on these rehearsal features. We also establish a weak convergence result for our algorithm, RLaR, demonstrating that RLaR converges in probability when certain conditions are met. We show experimentally that incorporating rehearsal features can enhance the learning rate compared to non-rehearsal-based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems. We also compare RLaR against an existing approximate Dec-POMDP solver which, like RLaR, does not assume a priori knowledge of the model. While RLaR׳s policy representation is not as scalable, we show that RLaR produces higher quality policies for most problems and horizons studied.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 190, 19 May 2016, Pages 82–94

نویسندگان

Landon Kraemer, Bikramjit Banerjee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یادگیری تقویت چند عامل به عنوان یک تمرین برای برنامه ریزی غیر متمرکز

دسترسی سریع

ارتباط

English Website