کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
567691 | 876134 | 2008 | 11 صفحه PDF | دانلود رایگان |
This paper discusses and evaluates a novel but simple and effective acoustic modeling method called “state-dependent phoneme-based model merging (SDPBMM)”, used to build dialectal Chinese speech recognizer from a small amount of dialectal Chinese speech. In SDPBMM, state-level pronunciation modeling is done by merging a tied-state of standard triphones with a state of dialectal monophone(s). In state-level pronunciation modeling, which acts as the merging criterion for SDPBMM, sparseness arises due to limited data set. To overcome this problem, a distance-based pronunciation modeling approach is also proposed. With a 40-min Shanghai-dialectal Chinese speech data, SDPBMM achieves a significant absolute syllable error rate (SER) reduction of approximately 7.1% (and a relative SER reduction of 14.3%) for Shanghai-dialectal Chinese, without performance degradation for standard Chinese. It is experimentally shown that SDPBMM outperforms Maximum Likelihood Linear Regression (MLLR) adaptation and the Pooled Retraining methods by 1.4% and 5.3%, respectively, in terms of SER reduction. Also, when combined with MLLR adaptation, an absolute SER reduction of 1.4% can further be achieved by SDPBMM.
Journal: Speech Communication - Volume 50, Issue 7, July 2008, Pages 605–615