کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
559006 875029 2015 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards
ترجمه فارسی عنوان
سیستم گفتمان تقویتی-یادگیری مبتنی بر تعامل انسان با روبات با پاداش الهام گرفته از اجتماع
کلمات کلیدی
تعامل انسان و روبات؛مدیریت گفتمان مبتنی بر POMDP؛تقویت یادگیری؛شکل دادن پاداش
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We integrate user appraisals in a POMDP-based dialogue manager procedure.
• We employ additional socially-inspired rewards in a RL setup to guide the learning.
• A unified framework for speeding up the policy optimisation and user adaptation.
• We consider a potential-based reward shaping with a sample efficient RL algorithm.
• Evaluated using both user simulator (information retrieval) and user trials (HRI).

This paper investigates some conditions under which polarized user appraisals gathered throughout the course of a vocal interaction between a machine and a human can be integrated in a reinforcement learning-based dialogue manager. More specifically, we discuss how this information can be cast into socially-inspired rewards for speeding up the policy optimisation for both efficient task completion and user adaptation in an online learning setting. For this purpose a potential-based reward shaping method is combined with a sample efficient reinforcement learning algorithm to offer a principled framework to cope with these potentially noisy interim rewards. The proposed scheme will greatly facilitate the system's development by allowing the designer to teach his system through explicit positive/negative feedbacks given as hints about task progress, in the early stage of training. At a later stage, the approach will be used as a way to ease the adaptation of the dialogue policy to specific user profiles. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS), support our claims in two configurations: firstly, with a user simulator in the tourist information domain (and thus simulated appraisals), and secondly, in the context of man–robot dialogue with real user trials.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 34, Issue 1, November 2015, Pages 256–274
نویسندگان
, ,