The asymptotic equipartition property in reinforcement learning and its relation to return maximization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
408115	678245	2006	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Markov decision process - روند تصمیم گیری مارکوف Information theory - نظریه اطلاعات Stochastic complexity - پیچیدگی تصادفی Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

The asymptotic equipartition property in reinforcement learning and its relation to return maximization

چکیده انگلیسی

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 19, Issue 1, January 2006, Pages 62–75

نویسندگان

Kazunori Iwata, Kazushi Ikeda, Hideaki Sakai,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

The asymptotic equipartition property in reinforcement learning and its relation to return maximization

دسترسی سریع

ارتباط

English Website