Article ID Journal Published Year Pages File Type
559179 Computer Speech & Language 2010 16 Pages PDF
Abstract

Nowadays the applications in multimedia domain require that the Speech/Music classifier has many other merits in addition to the accuracy, such as short-time delay and low complexity. Here, we endeavor to form a Speech/Music classifier by using different data mining methods. The main contributions of this paper are to obtain a system by analyzing the inherent validity of diverse features extracted from the audio, building a hierarchical structure of oblique decision trees (HODT) to maintain optimal performances, and applying a novel context-based state transform (ST) strategy to refine the classification results. The proposed algorithm is evaluated by a set of 5–11 min 702 audio files, which are made from 54 speech or music files according to different Signal-to-Noise Ratio (SNR) levels and diverse noise types. The experiment results show that our proposed classifier outperforms AMR-WB+ by achieving 97.9% and 95.9% in classification rate at the 10 ms frame level in pure and high SNR (> = 20 dB) environment, respectively. The post-processing ST strategy further enhances the system performance, particularly at low SNR circumstances (10 dB), with 5.6% up in the accuracy rate. In addition, the complexity of the proposed system is lower than 1WMOPS which make it easily adaptable to many scenarios.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , ,