Simulation-based optimization of Markov decision processes: An empirical process theory approach

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
697253	890363	2010	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Learning algorithms - الگوریتم های یادگیری Optimization - بهينه سازي Monte Carlo Simulation - روش مونت کارلو Markov decision processes - پروسه تصمیم گیری مارکوف Stochastic control - کنترل تصادفی

موضوعات مرتبط

مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی

پیش نمایش صفحه اول مقاله

Simulation-based optimization of Markov decision processes: An empirical process theory approach

چکیده انگلیسی

We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an ϵϵ-optimal policy from simulation. We provide sample complexity of such an approach.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Automatica - Volume 46, Issue 8, August 2010, Pages 1297–1304

نویسندگان

Rahul Jain, Pravin Varaiya,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Simulation-based optimization of Markov decision processes: An empirical process theory approach

دسترسی سریع

ارتباط

English Website