Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4969440 | Journal of Visual Communication and Image Representation | 2016 | 9 Pages |
Abstract
Minimizing morphological variances of the vocal tract across speakers is a challenge for articulatory analysis and modeling. In order to reduce morphological differences in speech organs among speakers and retain speakers' speech dynamics, our study proposes a method of normalizing the vocal-tract shapes of Mandarin and Japanese speakers by using a Thin-Plate Spline (TPS) method. We apply the properties of TPS in a two-dimensional space in order to normalize vocal-tract shapes. Furthermore, we also use DNN (Deep Neural Networks) based speech recognition for our evaluations. We obtained our template for normalization by measuring three speakers' palates and tongue shapes. Our results show a reduction in variances among subjects. The similar vowel structure of pre/post-normalization data indicates that our framework retains speaker specific characteristics. Our results for the articulatory recognition of isolated phonemes show an improvement of 25%. Moreover, our phone error rate of continuous speech reduced by 5.84%.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Jianguo Wei, Jingshu Zhang, Yan Ji, Qiang Fang, Wenhuan Lu,