Article ID Journal Published Year Pages File Type
6951453 Computer Speech & Language 2018 18 Pages PDF
Abstract
The duration of speech segments can significantly impact the performance of text-independent speaker verification systems. In real world applications which require high accuracy on short utterances, the performance of i-vector speaker verification framework degrades significantly considering that i-vectors extracted from short utterances are less reliable (i.e., uncertainty is higher) than those extracted from long utterances. Therefore, to handle duration variability properly, a more realistic approach seems to be required. This study is an extension to our recently proposed nearest neighbor probabilistic linear discriminant analysis (NN-PLDA) which estimates the parameters of PLDA in i-vector speaker verification framework using a nonparametric form rather than maximum likelihood estimation (MLE) obtained by an EM algorithm, and has been shown to provide superior performance. In NN-PLDA, the between-speaker covariance matrix that represents global information about the speaker variability is replaced with a local estimation computed on a nearest neighbor basis for each target speaker. Compared to their parametric counterparts, the nonparametric between- and within-speaker scatter matrices can better exploit the discriminant information in training data and are more adapted to sample distributions. In this paper, we provide further analysis on the proposed nonparametrically trained PLDA as well as introduce a duration variability modeling technique in the estimation of the within-speaker scatter matrix as to compensate for the effect of limited speech data. We evaluate our approach using core-10sec and 10sec-10sec telephone trial conditions of NIST 2010 SRE as well as on the truncated test utterances in extended core condition with duration less than 10 s. We also present the results obtained by the successful incorporation of NN-PLDA on the recent NIST 2016 speaker recognition evaluation. In all experiments, considerable performance improvement is obtained with the proposed technique compared to a generatively trained PLDA model.
Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,