Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6961154 | Speech Communication | 2015 | 16 Pages |
Abstract
Typically, voice conversion systems just use one type of spectral feature to convert acoustical characteristics of one speaker to another speaker. In this paper, we first study four different spectral features. Then, we compare these features and choose two features that perform better than others. Our experiments showed that cepstral features are more suitable than all-pole features for clustering and all-pole features are better for the analysis/synthesis stages. Hence, we propose a new voice conversion algorithm that uses both cepstral and all-pole features in order to utilize their desired properties simultaneously. We have two ideas to utilize this feature combination strategy. Our first idea is to apply feature combination to classical Gaussian mixture models (GMM)-based voice conversion method. The second idea is to apply feature combination to dynamic kernel partial least square regression (DKPLS) method. Results of our evaluations show that our proposed methods outperform the modern voice conversion methods in terms of speech quality and speaker individuality. Our methods are also robust to limited training data.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Mostafa Ghorbandoost, Abolghasem Sayadiyan, Mohsen Ahangar, Hamid Sheikhzadeh, Abdoreza Sabzi Shahrebabaki, Jamal Amini,