Article ID Journal Published Year Pages File Type
411950 Neurocomputing 2015 10 Pages PDF
Abstract

This paper builds on the technique of feature extraction from the spectrogram image of sound signals for automatic sound recognition. The spectrogram image is divided into blocks and statistical distributions are extracted from each block as features. However, when compared to related work, we reduce the dimensionality of the feature vector using mean and standard deviation values along the row and column of the blocks without compromising the classification accuracy. We demonstrate the technique in an audio surveillance application and evaluate the performance using four common multiclass support vector machine (SVM) classification techniques, one-against-all, one-against-one, decision directed acyclic graph, and adaptive directed acyclic graph. Experimentation was carried out using an audio database with 10 sound classes, each containing multiple subclasses with intraclass diversity and interclass similarity in terms of signal properties. Under noisy conditions, the proposed reduced spectrogram image feature (RSIF) produced significantly better classification accuracy than the conventional log compressed mel-frequency cepstral coefficients (MFCCs) and marginally better classification accuracy than linear MFCCs, which does not utilize any compression. The linear spectrogram image representations for feature extraction and the one-against-all multiclass SVM classification method were found to be the most noise robust. In addition, significantly improved results were obtained under noisy conditions when the RSIF is combined with linear MFCCs.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,