Article ID Journal Published Year Pages File Type
559006 Computer Speech & Language 2015 19 Pages PDF
Abstract

•We integrate user appraisals in a POMDP-based dialogue manager procedure.•We employ additional socially-inspired rewards in a RL setup to guide the learning.•A unified framework for speeding up the policy optimisation and user adaptation.•We consider a potential-based reward shaping with a sample efficient RL algorithm.•Evaluated using both user simulator (information retrieval) and user trials (HRI).

This paper investigates some conditions under which polarized user appraisals gathered throughout the course of a vocal interaction between a machine and a human can be integrated in a reinforcement learning-based dialogue manager. More specifically, we discuss how this information can be cast into socially-inspired rewards for speeding up the policy optimisation for both efficient task completion and user adaptation in an online learning setting. For this purpose a potential-based reward shaping method is combined with a sample efficient reinforcement learning algorithm to offer a principled framework to cope with these potentially noisy interim rewards. The proposed scheme will greatly facilitate the system's development by allowing the designer to teach his system through explicit positive/negative feedbacks given as hints about task progress, in the early stage of training. At a later stage, the approach will be used as a way to ease the adaptation of the dialogue policy to specific user profiles. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS), support our claims in two configurations: firstly, with a user simulator in the tourist information domain (and thus simulated appraisals), and secondly, in the context of man–robot dialogue with real user trials.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,