کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
496412 862859 2012 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comparing ANN and GMM in a voice conversion framework
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Comparing ANN and GMM in a voice conversion framework
چکیده انگلیسی

In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.

The desired spectral envelope along with the predicted spectral envelopes using baseline Gaussian mixture model (GMM) and proposed 5-layer feed forward neural network for mapping the vocal tract characteristics of a source speaker according to a desired target speaker for voice conversion. Figure optionsDownload as PowerPoint slideHighlights
► Database of four Indian speakers has been developed to design voice conversion system.
► 5-Layer ANN-based model has been proposed for vocal tract modification in a voice conversion framework.
► Spectral mapping using ANN perform equally well as that of the conventional GMM based model.
► Epoch based codebook model for pitch contour modification can capture the patterns of the desired target pitch contour.
► LP residual selection and LP residual copying methods are compared in terms speaker's identity and quality conversion.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 12, Issue 11, November 2012, Pages 3332–3342
نویسندگان
, , , , ,