کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6960742 1452004 2018 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Investigating very deep highway networks for parametric speech synthesis
ترجمه فارسی عنوان
بررسی شبکه های بزرگراه بسیار عمیق برای سنتز گفتار پارامتری
کلمات کلیدی
متن به گفتار، سنتز گفتاری پارامتریک گفتار، شبکه عصبی عمیق شبکه عصبی بزرگراه،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
Deep neural networks are powerful tools for classification and regression tasks. While a network with more than 100 hidden layers has been reported for image classification, how such a non-recurrent neural network with more than 10 hidden layers will perform for speech synthesis is as yet unknown. This work investigates the performance of deep networks on statistical parametric speech synthesis, particularly the question of whether different acoustic features can be better generated by a deeper network. To answer this question, this work examines a multi-stream highway network that separately generates spectral and F0 acoustic features based on the highway architecture. Experiments on the Blizzard Challenge 2011 corpus show that the accuracy of the generated spectral features consistently improves as the depth of the network increases from 2 to 40, but the F0 trajectory can be generated equally well by either a deep or a shallow network. Additional experiments on a single-stream highway and normal feedforward network, both of which generate spectral and F0 features from a single network, show that these networks must be deep enough to generate both kinds of acoustic features well. The difference in the performance of multi- and single-stream highway networks is further analyzed on the basis of the networks' activation and sensitivity to input features. In general, the highway network with more than 10 hidden layers, either multi- or single-stream, performs better on the experimental corpus than does a shallow network.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 96, February 2018, Pages 1-9
نویسندگان
, , ,