Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
566029 | Speech Communication | 2008 | 11 Pages |
This paper addresses speaker diarization, which consists of two steps: speaker turn detection and speaker clustering. These two steps require a metric to be defined in order to compare speech segments. Here, we employ a novel metric, based on one-class support vector machines, and recently introduced by one of the authors. This paper presents our speaker diarization primary system based one-class SVM, easy to build and configure. We show through several experiments, using NIST RT’03S and ESTER data sets, that our approach competes most standard approaches based on, e.g., Generalized Likelihood Ratios or Gaussian Mixture Models and may be complementary to them. Moreover, our technique permits the use of any-dimensional heterogeneous acoustic feature vectors, while keeping the computational cost reasonable.