Rapid bootstrapping of statistical spoken dialogue systems

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
567689	876134	2008	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Semantic annotation - حاشیه نویسی معنایی Spoken dialog systems - سیستم های گفتاری گفتاری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Rapid bootstrapping of statistical spoken dialogue systems

چکیده انگلیسی

Rapid deployment of statistical spoken dialogue systems poses portability challenges for building new applications. We discuss the challenges that arise and focus on two main problems: (i) fast semantic annotation for statistical speech understanding and (ii) reliable and efficient statistical language modeling using limited in-domain resources. We address the first problem by presenting a new bootstrapping framework that uses a majority-voting based combination of three methods for the semantic annotation of a “mini-corpus” that is usually manually annotated. The three methods are a statistical decision tree based parser, a similarity measure and a support vector machine classifier. The bootstrapping framework results in an overall cost reduction of about a factor of two in the annotation effort compared to the baseline method. We address the second problem by devising a method to efficiently build reliable statistical language models for new spoken dialog systems, given limited in-domain data. This method exploits external text resources that are collected for other speech recognition tasks as well as dynamic text resources acquired from the World Wide Web. The proposed method is applied to a spoken dialog system in a financial transaction domain and a natural language call-routing task in a package shipment domain. The experiments demonstrate that language models built using external resources, when used jointly with the limited in-domain language model, result in relative word error rate reductions of 9–18%. Alternatively, the proposed method can be used to produce a 3-to-10 fold reduction for the in-domain data requirement to achieve a given performance level.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 50, Issue 7, July 2008, Pages 580–593

نویسندگان

Ruhi Sarikaya,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Rapid bootstrapping of statistical spoken dialogue systems

دسترسی سریع

ارتباط

English Website