Robust speech/non-speech classification in heterogeneous multimedia content

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
567550	876105	2011	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Robust speech/non-speech classification in heterogeneous multimedia content

چکیده انگلیسی

In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because no parameter tuning is needed and no training data is required to train models for specific sounds, the classifier is able to process a wide range of audio types with varying conditions and thereby contributes to the development of a more robust automatic speech recognition framework.Our speech/non-speech classification system does not attempt to classify all audible non-speech in a single run. Instead, first a bootstrap speech/silence classification is obtained using a standard speech/non-speech classifier. Next, models for speech, silence and audible non-speech are trained on the target audio using the bootstrap classification. The experiments show that the performance of the proposed system is 83% and 44% (relative) better than that of a common broadcast news speech/non-speech classifier when applied to a collection of meetings recorded with table-top microphones and a collection of Dutch television broadcasts used for TRECVID 2007.

Figure optionsDownload as PowerPoint slideResearch highlights
► Speech/non-speech classification can be done without the use of priorly trained statistical models.
► The proposed method is language independent even when a standard language dependent GMM is used for bootstrapping.
► The speech and non-speech models that are trained on the data itself should be generated iteratively: re-segmenting the data while adding gaussians.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 53, Issue 2, February 2011, Pages 143–153

نویسندگان

Marijn Huijbregts, Franciska de Jong,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Robust speech/non-speech classification in heterogeneous multimedia content

دسترسی سریع

ارتباط

English Website