کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6951619 1451699 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel
ترجمه فارسی عنوان
انتخاب داده های سازگار برای تقسیم صوتی مبتنی بر یادگیری عمیق در کانال پخش چند ژانر
کلمات کلیدی
یادگیری عمیق، تقسیم صوتی، تشخیص فعالیت صوتی، اطلاعات طولانی مدت، پخش چند ژانر،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
Broadcast audio transcription is still a challenging problem because of the complexity of diverse speech and audio signals. Audio segmentation, which is an essential module in a broadcast audio transcription system, has benefited greatly from the development of deep learning theory. However, the need of large amounts of labeled training data becomes a bottleneck of deep learning-based audio segmentation methods. To tackle this problem, an adapted segmentation method is proposed to select speech/non-speech segments with high confidence from unlabeled training data as complements to the labeled training data. The new method relies on GMM-based speech/non-speech models trained on an utterance-by-utterance basis. The long-term information is used to choose reliable training data for speech/non-speech models from the utterances at hand. Experimental results show that this data selection method is a powerful audio segmentation algorithm of its own. We also observed that the deep neural networks trained using data selected by this method are superior to those trained with data chosen by two comparing methods. Moreover, better performance could be obtained by combining the deep learning-based audio segmentation method with the adapted data selection method.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Digital Signal Processing - Volume 81, October 2018, Pages 8-15
نویسندگان
, , , ,