کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
567550 876105 2011 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Robust speech/non-speech classification in heterogeneous multimedia content
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Robust speech/non-speech classification in heterogeneous multimedia content
چکیده انگلیسی

In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because no parameter tuning is needed and no training data is required to train models for specific sounds, the classifier is able to process a wide range of audio types with varying conditions and thereby contributes to the development of a more robust automatic speech recognition framework.Our speech/non-speech classification system does not attempt to classify all audible non-speech in a single run. Instead, first a bootstrap speech/silence classification is obtained using a standard speech/non-speech classifier. Next, models for speech, silence and audible non-speech are trained on the target audio using the bootstrap classification. The experiments show that the performance of the proposed system is 83% and 44% (relative) better than that of a common broadcast news speech/non-speech classifier when applied to a collection of meetings recorded with table-top microphones and a collection of Dutch television broadcasts used for TRECVID 2007.

Figure optionsDownload as PowerPoint slideResearch highlights
► Speech/non-speech classification can be done without the use of priorly trained statistical models.
► The proposed method is language independent even when a standard language dependent GMM is used for bootstrapping.
► The speech and non-speech models that are trained on the data itself should be generated iteratively: re-segmenting the data while adding gaussians.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 53, Issue 2, February 2011, Pages 143–153
نویسندگان
, ,