دانلود رایگان مقاله: مدلهای صوتی شبکه عصبی عمیق برای برنامه های ارزیابی گفتاری

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
565885	1452027	2015	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Deep neural network acoustic models for spoken assessment applications

ترجمه فارسی عنوان

مدلهای صوتی شبکه عصبی عمیق برای برنامه های ارزیابی گفتاری

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

مدل سازی آکوستیک، شبکه های عمیق عصبی، فناوری گفتار برای آموزش، آموزش زبان، ارزیابی مهارت زبان

Language learning - آموزش زبان Deep neural networks - شبکه های عصبی عمیق Acoustic modeling - مدل سازی آکوستیک

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

مدلهای صوتی شبکه عصبی عمیق برای برنامه های ارزیابی گفتاری

چکیده انگلیسی

• DNN-HMMs outperform GMM-HMMs by a large margin for all spoken assessment tasks.
• Open-ended tasks benefit far more than constrained tasks from the use of DNN-HMMs.
• For open-ended tasks, DNN-HMMs can take full advantage of increasing training data.
• The performance of constrained tasks saturates at around 25 h of training data.
• Constrained tasks require only a few hours of data to build well-performing models.

In this paper, we investigate the effectiveness of applying deep neural network hidden Markov models, or DNN-HMMs, for acoustic modeling in the context of educational applications. Specifically, we focus on spoken responses from non-native and child speech that tend to show great acoustic variability. We perform comprehensive experiments to compare the performance between traditional Gaussian mixture model (GMM)-HMMs and DNN-HMMs in three large language assessment datasets that contain various spoken tasks, classified broadly as constrained and open-ended tasks. Our experimental results suggest useful conclusions that can help guide the design of real-life educational applications. DNN-HMMs outperform conventional GMM-HMMs by a large margin for all spoken tasks commonly used in spoken assessment applications. In our experiments, DNN-HMMs trained using 25 h of data can outperform GMM-HMMs trained with 6.7–9 times data. Specifically regarding overall performance, when all available training data were used (175, 227, 169 h respectively), we achieved a relative word error rate decrease of 20.4% for adult English and 29.3% for child English, and a relative character error rate decrease of 14.3% for adult Chinese, when switching from GMMs to DNNs. In comparing between types of tasks, we notice that the more challenging open-ended tasks benefit significantly more than constrained item types by the use of DNN-HMMs. For open-ended tasks, having large amounts of training data is the key, as DNN-HMMs can take full advantage of the added training data and further push performance. In contrast, the performance of constrained spoken tasks saturates at around 25 h of training data. At the same time, constrained spoken tasks require only a few hours of data (1 or 5 h) to build well-performing acoustic models. This is an encouraging observation, that indicates the potential to build reliable spoken assessment applications based on constrained tasks, when few domain specific training data are available.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 73, October 2015, Pages 14–27

نویسندگان

Jian Cheng, Xin Chen, Angeliki Metallinou,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : مدلهای صوتی شبکه عصبی عمیق برای برنامه های ارزیابی گفتاری

دسترسی سریع

ارتباط

English Website