Effective acoustic model clustering via decision-tree with supervised learning

Article ID	Journal	Published Year	Pages	File Type
10370556	Speech Communication	2005	13 Pages	PDF

Abstract

In large vocabulary speech recognition, context-dependent modeling is essential for improving both accuracy and speed. To cope with the sparse data problem that arises from the proliferation of context-dependent models, two kinds of clustering methods, data-driven and rule-based, have been vigorously investigated. The inherent difficulty of applying data-driven approaches to unknown contexts has motivated the development of better rule-based clustering methods. This paper develops a hybrid approach that essentially constructs a supervised decision rule which operates on pre-clustered triphones. This scheme employs the C45 decision-tree learning algorithm to extract the attributes that best support clustering of training data. In particular, the data-driven method is used as a clustering algorithm, while its result is used as the learning target of the C45 algorithm. The proposed scheme provides an effective solution to the clustering error problem arising from unsupervised decision-tree learning and also renders successful clustering of the multiple mixture Gaussian state distributions. In speaker-independent, task-independent continuous speech recognition, the proposed method reduced the relative WER by 3.93%.

Keywords

Decision-tree Acoustic modeling Large vocabulary continuous speech recognition