کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4977794 1452007 2017 31 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Deep Elman recurrent neural networks for statistical parametric speech synthesis
ترجمه فارسی عنوان
شبکه های عصبی مجازی عمیق المن برای سنتز گفتاری پارامتری
کلمات کلیدی
سنتز گفتار، شبکه عصبی مکرر، شبکه های عمیق عصبی، حالت مخفی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
Owing to the success of deep learning techniques in automatic speech recognition, deep neural networks (DNNs) have been used as acoustic models for statistical parametric speech synthesis (SPSS). DNNs do not inherently model the temporal structure in speech and text, and hence are not well suited to be directly applied to the problem of SPSS. Recurrent neural networks (RNN) on the other hand have the capability to model time-series. RNNs with long short-term memory (LSTM) cells have been shown to outperform DNN based SPSS. However, LSTM cells and its variants like gated recurrent units (GRU), simplified LSTMs (SLSTM) have complicated structure and are computationally expensive compared to the simple recurrent architecture like Elman RNN. In this paper, we explore deep Elman RNNs for SPSS and compare their effectiveness against deep gated RNNs. Specifically, we perform experiments to show that (1) Deep Elman RNNs are better suited for acoustic modeling in SPSS when compared to DNNs and perform competitively to deep SLSTMs, GRUs and LSTMs, (2) Context representation learning using Elman RNNs improves neural network acoustic models for SPSS, and (3) Elman RNN based duration model is better than the DNN based counterpart. Experiments were performed on Blizzard Challenge 2015 dataset consisting of 3 Indian languages (Telugu, Hindi and Tamil). Through subjective and objective evaluations, we show that our proposed systems outperform the baseline systems across different speakers and languages.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 93, October 2017, Pages 31-42
نویسندگان
, ,