Speaker diarization using one-class support vector machines

Article ID	Journal	Published Year	Pages	File Type
566029	Speech Communication	2008	11 Pages	PDF

Abstract

This paper addresses speaker diarization, which consists of two steps: speaker turn detection and speaker clustering. These two steps require a metric to be defined in order to compare speech segments. Here, we employ a novel metric, based on one-class support vector machines, and recently introduced by one of the authors. This paper presents our speaker diarization primary system based one-class SVM, easy to build and configure. We show through several experiments, using NIST RT’03S and ESTER data sets, that our approach competes most standard approaches based on, e.g., Generalized Likelihood Ratios or Gaussian Mixture Models and may be complementary to them. Moreover, our technique permits the use of any-dimensional heterogeneous acoustic feature vectors, while keeping the computational cost reasonable.

Keywords

Speaker diarization One-class support vector machine