دانلود رایگان مقاله: تبعیض گفتاری و موزون با استفاده از ابزارهای برجسته بصری

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6854680	1437592	2018	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Speech-music discrimination using deep visual feature extractors

ترجمه فارسی عنوان

تبعیض گفتاری و موزون با استفاده از ابزارهای برجسته بصری

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

سی ان ان، تبعیض گفتاری و موسیقی، انتقال یادگیری، تجزیه و تحلیل صوتی،

Transfer learning - انتقال یادگیری Audio analysis - تجزیه و تحلیل صوتی CNNs - سی ان ان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

تبعیض گفتاری و موزون با استفاده از ابزارهای برجسته بصری

چکیده انگلیسی

Speech music discrimination is a traditional task in audio analytics, useful for a wide range of applications, such as automatic speech recognition and radio broadcast monitoring, that focuses on segmenting audio streams and classifying each segment as either speech or music. In this paper we investigate the capabilities of Convolutional Neural Networks (CNNs) with regards to the speech - music discrimination task. Instead of representing the audio content using handcrafted audio features, as traditional methods do, we use deep structures to learn visual feature dependencies as they appear on the spectrogram domain (i.e. train a CNN using audio spectrograms as input images). The main contribution of our work focuses on the potentials of using pre-trained deep architectures along with transfer-learning to train robust audio classifiers for the particular task of speech music discrimination. We highlight the supremacy of the proposed methods, compared both to the typical audio-based and deep-learning methods that adopt handcrafted features, and we evaluate our system in terms of classification success and run-time execution. To our knowledge this is the first work that investigates CNNs for the task of speech music discrimination and the first that exploits transfer learning across very different domains for audio modeling using deep-learning in general. In particular, we fine-tune a deep architecture originally trained for the Imagenet classification task, using a relatively small amount of data (almost 80Â min of training audio samples) along with data augmentation. We evaluate our system through extensive experimentation against three different datasets. Firstly we experiment on a real-world dataset of more than 10Â h of uninterrupted radio broadcasts and secondly, for comparison purposes, we evaluate our best method on two publicly available datasets that were designed specifically for the task of speech-music discrimination. Our results indicate that CNNs can significantly outperform current state-of-the-art in terms of performance especially when transfer learning is applied, in all three test-datasets. All the discussed methods, along with the whole experimental setup and the respective datasets, are openly provided for reproduction and further experimentation.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 114, 30 December 2018, Pages 334-344

نویسندگان

Michalis Papakostas, Theodoros Giannakopoulos,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تبعیض گفتاری و موزون با استفاده از ابزارهای برجسته بصری

دسترسی سریع

ارتباط

English Website