Reinforcement learning for parameter estimation in statistical spoken dialogue systems

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
563165	875473	2012	25 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

POMDP Spoken Dialogue Systems - سیستم گفتمان گفتاری Dialogue management - مدیریت گفتگو Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

چکیده انگلیسی

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward.This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters.

► We present two algorithms for learning parameters in statistical dialogue systems.
► These algorithms maximise an expected reward function of a dialogue system.
► The otimisation of the dialogue model and the policy increases the expected reward.
► The algorithms were evaluated on both a user simulator and real users.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 26, Issue 3, June 2012, Pages 168–192

نویسندگان

Filip Jurčíček, Blaise Thomson, Steve Young,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

دسترسی سریع

ارتباط

English Website