Article ID Journal Published Year Pages File Type
569076 Speech Communication 2006 11 Pages PDF
Abstract

Most speaker identification systems train individual models for each speaker. This is done as individual models often yield better performance and they permit easier adaptation and enrollment. When classifying a speech token, the token is scored against each model and the maximum a priori decision rule is used to decide the classification label. Consequently, the cost of classification grows linearly for each token as the population size grows. When considering that the number of tokens to classify is also likely to grow linearly with the population, the total work load increases exponentially.This paper presents a preclassifier which generates an N-best hypothesis using a novel application of Gaussian selection, and a transformation of the traditional tail test statistic which lets the implementer specify the tail region in terms of probability. The system is trained using parameters of individual speaker models and does not require the original feature vectors, even when enrolling new speakers or adapting existing ones. As the correct class label need only be in the N-best hypothesis set, it is possible to prune more Gaussians than in a traditional Gaussian selection application. The N-best hypothesis set is then evaluated using individual speaker models, resulting in an overall reduction of workload.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
,