Article ID Journal Published Year Pages File Type
567580 Speech Communication 2011 12 Pages PDF
Abstract

Recent research has demonstrated the merit of combining Gaussian mixture models and support vector machine (SVM) for text-independent speaker verification. However, one unaddressed issue in this GMM–SVM approach is the imbalance between the numbers of speaker-class utterances and impostor-class utterances available for training a speaker-dependent SVM. This paper proposes a resampling technique – namely utterance partitioning with acoustic vector resampling (UP-AVR) – to mitigate the data imbalance problem. Briefly, the sequence order of acoustic vectors in an enrollment utterance is first randomized, which is followed by partitioning the randomized sequence into a number of segments. Each of these segments is then used to produce a GMM supervector via MAP adaptation and mean vector concatenation. The randomization and partitioning processes are repeated several times to produce a sufficient number of speaker-class supervectors for training an SVM. Experimental evaluations based on the NIST 2002 and 2004 SRE suggest that UP-AVR can reduce the error rate of GMM–SVM systems.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,