Article ID Journal Published Year Pages File Type
565942 Speech Communication 2012 17 Pages PDF
Abstract

Unvoiced stops are rapidly varying sounds with acoustic cues to place identity linked to the temporal dynamics. Neurophysiological studies have indicated the importance of joint spectro-temporal processing in the human perception of stops. In this study, two distinct approaches to modeling the spectro-temporal envelope of unvoiced stop phone segments are investigated with a view to obtaining a low-dimensional feature vector for automatic place classification. Classification accuracies on the TIMIT database and a Marathi words dataset show the overall superiority of classifier combination of polynomial surface coefficients and 2D-DCT. A comparison of performance with published results on the place classification of stops revealed that the proposed spectro-temporal feature systems improve upon the best previous systems’ performances. The results indicate that joint spectro-temporal features may be usefully incorporated in hierarchical phone classifiers based on diverse class-specific features.

► Segment based classification of unvoiced stops in English and Marathi is addressed. ► Localized discrete cosine transform coefficients are evaluated for the task. ► Joint spectro-temporal modeling via bivariate polynomials is proposed. ► The best so far published results on the same task are improved upon.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,