Bridge the semantic gap between pop music acoustic feature and emotion: Build an interpretable model

Article ID	Journal	Published Year	Pages	File Type
4948631	Neurocomputing	2016	9 Pages	PDF

Abstract

Music emotion recognition (MER) is an important topic in music understanding, recommendation, retrieval and human computer interaction. Great success has been achieved by machine learning methods in estimating human emotional response to music. However, few of them pay much attention in semantic interpret for emotion response. In our work, we first train an interpretable model between acoustic audio and emotion. Filter, wrapper and shrinkage methods are applied to select important features. We then apply statistical models to build and explain the emotion model. Extensive experimental results reveal that the shrinkage methods outperform the wrapper methods and the filter methods in arousal emotion. In addition, we observed that only a small set of the extracted features have the key effects to arousal. While, most of our extracted features have small contribution to valence music perception. Ultimately, we obtain a higher average accuracy rate in arousal, compared to that in valence.

Keywords

Shrinkage methods Feature selection Semantic gap