کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558653 874959 2007 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A segment-based interpretation of HMM/ANN hybrids
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
A segment-based interpretation of HMM/ANN hybrids
چکیده انگلیسی

Here we seek to understand the similarities and differences between two speech recognition approaches, namely the HMM/ANN hybrid and the posterior-based segmental model. Both these techniques create local posterior probability estimates and combine these estimates into global posteriors – but they are built on somewhat different concepts and mathematical derivations. The HMM/ANN hybrid combines the local estimates via a formulation that is inherited from the generative HMM concept, while the components of the segment-based model correspond quite directly to the two subtasks of phonetic decoding: segmentation and classification. In this paper we attempt to identify the corresponding components of the segmental model within the hybrid model, with the intent of gaining an insight from this unusual point of view. As regards one of these components, the segment-based phone posteriors, we show that the independence-based product rule combination applied in the hybrid produces strongly biased estimates. As for the other component, the segmentation probability factor, we argue that it is present in the hybrid thanks to the bias of the product rule – that is, the product rule goes wrong in such a special way that it helps the model find the best segmentation of the input. To prove this assertion, we combine this bias with the posterior estimates obtained by averaging, and find that the resulting ‘averaging hybrid’ slightly outperforms the standard one on a phone recognition task and a word recognition task as well. Overall we conclude that the contribution of the product rule to the decoding process is just as important for the segmentation subtask as it is for the segment classification subtask.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 21, Issue 3, July 2007, Pages 562–578
نویسندگان
, ,