Human and machine consonant recognition

Article ID	Journal	Published Year	Pages	File Type
10370133	Speech Communication	2005	23 Pages	PDF

Abstract

Three traditional ASR parameterizations matched with Hidden Markov Models (HMMs) are compared to humans for speaker-dependent consonant recognition using nonsense syllables degraded by highpass filtering, lowpass filtering, or additive noise. Confusion matrices were determined by recognizing the syllables using different ASR front ends, including Mel-Filter Bank (MFB) energies, Mel-Filtered Cepstral Coefficients (MFCCs), and the Ensemble Interval Histogram (EIH). In general the MFB recognition accuracy was slightly higher than the MFCC, which was higher than the EIH. For syllables degraded by lowpass and highpass filtering, automated systems trained on the degraded condition recognized the consonants as well as humans. For syllables degraded by additive speech-shaped noise, none of the automated systems recognized consonants as well as humans. The greatest advantage displayed by humans was in determining the correct voiced/unvoiced classification of consonants in noise.

Keywords

Consonant identification Speech recognition Automatic speech recognition Noise Filtering