Voice conversion based on feature combination with limited training data

Article ID	Journal	Published Year	Pages	File Type
6961154	Speech Communication	2015	16 Pages	PDF

Abstract

Typically, voice conversion systems just use one type of spectral feature to convert acoustical characteristics of one speaker to another speaker. In this paper, we first study four different spectral features. Then, we compare these features and choose two features that perform better than others. Our experiments showed that cepstral features are more suitable than all-pole features for clustering and all-pole features are better for the analysis/synthesis stages. Hence, we propose a new voice conversion algorithm that uses both cepstral and all-pole features in order to utilize their desired properties simultaneously. We have two ideas to utilize this feature combination strategy. Our first idea is to apply feature combination to classical Gaussian mixture models (GMM)-based voice conversion method. The second idea is to apply feature combination to dynamic kernel partial least square regression (DKPLS) method. Results of our evaluations show that our proposed methods outperform the modern voice conversion methods in terms of speech quality and speaker individuality. Our methods are also robust to limited training data.

Keywords

Voice conversion Feature combination