A scale–rate filter selection method in the spectro-temporal domain for phoneme classification

Article ID	Journal	Published Year	Pages	File Type
455717	Computers & Electrical Engineering	2013	12 Pages	PDF

Abstract

Recently, there has been a significant increase in studies employing auditory models in speech recognition systems. In this paper, we propose a new evolutionary tuned feature extraction method by spectro-temporal analysis. In our proposed model, there is a special subspace for each phoneme with a specific best scale in the spectral filter and a specific best rate in the temporal filter. These two parameters were obtained by genetic cellular automata evolutionary algorithm. The extracted features from the specific subspace are classified by a binary one-versus-rest support vector machine. Finally, a multiclass classifier for all phonemes is employed by combining these sub-models. The proposed method improved the discrimination of phonemes significantly especially in highly confusable phonemes. To show the efficiency of the proposed feature sets, it was empirically compared with two baseline models. The achieved relative improvements are about 10% in classification rate for voiced plosives, unvoiced plosives and nasals; and about 7.38% for front vowels relative to the state of the art baseline model.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights► We propose a new evolutionary tuned feature extraction method by spectro-temporal analysis. ► Each of the phonemes can well be discriminated in a very small subspace of high-dimensional cortical representation. ► The method improved the discrimination of phonemes significantly especially in highly confusable phonemes.