Investigating a hybrid of Tone-Model and Particle Swarm Optimization techniques in transcribing polyphonic guitar sound

Article ID	Journal	Published Year	Pages	File Type
495239	Applied Soft Computing	2015	10 Pages	PDF

Abstract

•Each Tone-Model is constructed from monophonic training examples.•Plausible sounding pitches can be guessed from polyphonic input using Tone-Models.•PSO finds the optimal Tone-Models combinations that best describes the audio input.•The approach gives a better transcription accuracy than the competing NMF technique.•The approach performs well when the Tone-Models are stable (synthesized audio case).

In this article, we describe a novel polyphonic analysis that employs a hybrid of Tone-Model (TM) and Particle Swarm Optimization (PSO) techniques. This hybrid approach exploits the strengths of model-based and heuristic-search approaches. The correlations between each monophonic Tone-Model and the polyphonic input are used to predict relevant pitches such that the aggregations of the pitches’ Tone-Models are able to describe the harmonic contents of the polyphonic input. These aggregations are then refined using PSO. PSO heuristically searches for a local optimal aggregation in which some Tone-Models suggested earlier may be excluded from the final best aggregation. We present and discuss the design of our approach. The experimental results from the proposed hybrid approach are compared and contrasted with the non-negative matrix factorization (NMF) technique. A performance comparison between synthesized guitar sound and acoustic guitar sound is discussed. The experimental results confirm the potential of TM–PSO in polyphonic transcription task.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords

Non-negative matrix factorization