Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
495239 | Applied Soft Computing | 2015 | 10 Pages |
•Each Tone-Model is constructed from monophonic training examples.•Plausible sounding pitches can be guessed from polyphonic input using Tone-Models.•PSO finds the optimal Tone-Models combinations that best describes the audio input.•The approach gives a better transcription accuracy than the competing NMF technique.•The approach performs well when the Tone-Models are stable (synthesized audio case).
In this article, we describe a novel polyphonic analysis that employs a hybrid of Tone-Model (TM) and Particle Swarm Optimization (PSO) techniques. This hybrid approach exploits the strengths of model-based and heuristic-search approaches. The correlations between each monophonic Tone-Model and the polyphonic input are used to predict relevant pitches such that the aggregations of the pitches’ Tone-Models are able to describe the harmonic contents of the polyphonic input. These aggregations are then refined using PSO. PSO heuristically searches for a local optimal aggregation in which some Tone-Models suggested earlier may be excluded from the final best aggregation. We present and discuss the design of our approach. The experimental results from the proposed hybrid approach are compared and contrasted with the non-negative matrix factorization (NMF) technique. A performance comparison between synthesized guitar sound and acoustic guitar sound is discussed. The experimental results confirm the potential of TM–PSO in polyphonic transcription task.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide