Data-driven generation of phonetic broad classes, based on phoneme confusion matrix similarity

Article ID	Journal	Published Year	Pages	File Type
10370544	Speech Communication	2005	15 Pages	PDF

Abstract

This paper addresses the topic of defining phonetic broad classes needed during acoustic modeling for speech recognition in the procedure of decision tree based clustering. The usual approach is to use phonetic broad classes which are defined by an expert. This method has some disadvantages, especially in the case of multilingual speech recognition. A new data-driven method is proposed for the generation of phonetic broad classes based on a phoneme confusion matrix. The similarity measure is defined using the number of confusions between the master phoneme and all other phonemes included in the set. This proposed method is compared to the standard approach based on expert knowledge and to the randomly generated broad classes approach. The proposed data-driven method is implicitly evaluated within a speech recognition experiment. The intention of the first evaluation stage is to test the generated acoustic models in a monolingual environment (Slovenian), to show that the proposed method does not contain a multilingual influence. In the second evaluation stage, the generated acoustic models are tested in a multilingual environment (Slovenian, German and Spanish). All experiments were based on SpeechDat(II) speech databases. The proposed data-driven method for the generation of phonetic broad classes, based on phoneme confusion matrix, improved speech recognition results when compared to the method based on expert knowledge.

Keywords

Speech recognition Acoustic modeling