Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
690146	889707	2006	22 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

k-nearest neighbor - K نزدیکترین همسایه Approximate dynamic programming - تقریبا برنامه نویسی پویا Neural network - شبکه عصبی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی شیمی تکنولوژی و شیمی فرآیندی

پیش نمایش صفحه اول مقاله

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

چکیده انگلیسی

This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the 'curse-of-dimensionality' of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. 'cost-to-go' data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Process Control - Volume 16, Issue 2, February 2006, Pages 135-156

نویسندگان

Jong Min Lee, Niket S. Kaisare, Jay H. Lee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

دسترسی سریع

ارتباط

English Website