کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558206 1451691 2016 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Integrating articulatory data in deep neural network-based acoustic modeling
ترجمه فارسی عنوان
یکپارچه سازی داده های شمرده شمرده در مدل سازی آکوستیک مبتنی بر شبکه عصبی عمیق
کلمات کلیدی
DNN-HMM؛ نقشه برداری صوتی به شمرده شمرده ؛ شبکه های عصبی عمیق. مدل سازی آکوستیک؛ articulography الکترومغناطیسی؛ خودرمزگذار
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We test strategies to exploit articulatory data in DNN-HMM phone recognition.
• Autoencoder-transformed articulatory features produce the best results.
• Pre-training of phone classifier DNNs driven by acoustic-to-articulatory mapping.
• Utility of articulatory information in noisy conditions and in cross-speaker settings.

Hybrid deep neural network–hidden Markov model (DNN-HMM) systems have become the state-of-the-art in automatic speech recognition. In this paper we experiment with DNN-HMM phone recognition systems that use measured articulatory information. Deep neural networks are both used to compute phone posterior probabilities and to perform acoustic-to-articulatory mapping (AAM). The AAM processes we propose are based on deep representations of the acoustic and the articulatory domains. Such representations allow to: (i) create different pre-training configurations of the DNNs that perform AAM; (ii) perform AAM on a transformed (through DNN autoencoders) articulatory feature (AF) space that captures strong statistical dependencies between articulators. Traditionally, neural networks that approximate the AAM are used to generate AFs that are appended to the observation vector of the speech recognition system. Here we also study a novel approach (AAM-based pretraining) where a DNN performing the AAM is instead used to pretrain the DNN that computes the phone posteriors. Evaluations on both the MOCHA-TIMIT msak0 and the mngu0 datasets show that: (i) the recovered AFs reduce phone error rate (PER) in both clean and noisy speech conditions, with a maximum 10.1% relative phone error reduction in clean speech conditions obtained when autoencoder-transformed AFs are used; (ii) AAM-based pretraining could be a viable strategy to exploit the available small articulatory datasets to improve acoustic models trained on large acoustic-only datasets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 36, March 2016, Pages 173–195
نویسندگان
, , , ,