کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
536009 | 870429 | 2011 | 7 صفحه PDF | دانلود رایگان |
Speech is the most natural form of communication for human beings. However, in situations where audio speech is not available because of disability or adverse environmental condition, people may resort to alternative methods such as augmented speech, that is, audio speech supplemented or replaced by other modalities, such as audiovisual speech, or Cued Speech. This article introduces augmented speech communication based on Electro-Magnetic Articulography (EMA). Movements of the tongue, lips, and jaw are tracked by EMA and are used as features to create hidden Markov models (HMMs). In addition, automatic phoneme recognition experiments are conducted to examine the possibility of recognizing speech only from articulation, that is, without any audio information. The results obtained are promising, which confirm that phonetic features characterizing articulation are as discriminating as those characterizing acoustics (except for voicing). This article also describes experiments conducted in noisy environments using fused audio and EMA parameters. It has been observed that when EMA parameters are fused with noisy audio speech, the recognition rate increases significantly as compared with using noisy audio speech only.
Research highlights
► An Electro-Magnetic Articulography (EMA) device can capture tongue movements accurately.
► Using Hidden Markov models, articulatory movements of lips/jaw/tongue can be recognized.
► Articulatory features are as discriminant as the acoustic ones (except voicing).
► Tongue movements can be recognized with higher accuracy compared to lips/jaw movements.
► Robustness against noise increases when fusing articulatory features with audio features.
Journal: Pattern Recognition Letters - Volume 32, Issue 8, 1 June 2011, Pages 1119–1125