Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
565942 | Speech Communication | 2012 | 17 Pages |
Unvoiced stops are rapidly varying sounds with acoustic cues to place identity linked to the temporal dynamics. Neurophysiological studies have indicated the importance of joint spectro-temporal processing in the human perception of stops. In this study, two distinct approaches to modeling the spectro-temporal envelope of unvoiced stop phone segments are investigated with a view to obtaining a low-dimensional feature vector for automatic place classification. Classification accuracies on the TIMIT database and a Marathi words dataset show the overall superiority of classifier combination of polynomial surface coefficients and 2D-DCT. A comparison of performance with published results on the place classification of stops revealed that the proposed spectro-temporal feature systems improve upon the best previous systems’ performances. The results indicate that joint spectro-temporal features may be usefully incorporated in hierarchical phone classifiers based on diverse class-specific features.
► Segment based classification of unvoiced stops in English and Marathi is addressed. ► Localized discrete cosine transform coefficients are evaluated for the task. ► Joint spectro-temporal modeling via bivariate polynomials is proposed. ► The best so far published results on the same task are improved upon.