Article ID Journal Published Year Pages File Type
565948 Speech Communication 2012 11 Pages PDF
Abstract

The impact of changes in a speaker’s vocal effort on the performance of automatic speech recognition has largely been overlooked by researchers and virtually no speech resources exist for the development and testing of speech recognizers at all vocal effort levels. This study deals with speech properties in the whole range of vocal modes – whispering, soft speech, normal speech, loud speech, and shouting. Fundamental acoustic and phonetic changes are documented. The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system’s robustness are tested. The proposed multiple model framework approach reaches a 50% relative reduction of word error rate compared to the baseline system. A new specialized speech database, BUT-VE1, is presented, which contains speech recordings of 13 speakers at 5 vocal effort levels with manual phonetic segmentation and sound pressure level calibration.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights► Impact of variable vocal effort level on the performance of speech recognition is shown. ► New speech database BUT-VE1 is introduced. ► MMF approach yielding 49.9% reduction in WER at all VE levels is proposed.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,