Domain compensation based on phonetically discriminative features for speaker verification

Article ID	Journal	Published Year	Pages	File Type
6951533	Computer Speech & Language	2017	19 Pages	PDF

Abstract

This paper presents a new domain compensation framework by using phonetically discriminative features which are extracted from domain-dependent deep neural networks (DNNs). The domain compensation can be applied in both unsupervised and supervised manner, depending on whether the domain information of the development data is provided or not in advance. In supervised manner, the DNNs are trained on the development speech recordings of each given domain separately. While in the unsupervised manner, the development datasets are first automatically clustered into different domains, by using the Gaussian Mixture Model mean supervectors which are generated from each of the speech recordings, DNNs are then trained on the resulting clusters. Finally, we compensate the domain variabilities during the target speaker modeling step using support vector machines, by feeding in statistical vectors which are derived from the discriminative features extracted from the domain-dependent DNNs. The main strength of our proposed framework is that it does not need any speaker labels in the development dataset, which makes the proposed framework of great advantage over the state-of-the-art techniques that need speaker labels to train inter-speaker and/or intra-speaker variability models or channel compensation. Three speaker verification systems are investigated to examine the effectiveness of this new framework. Experimental results on the NIST SRE 2010 task demonstrate competitive performances to the state-of-the-art techniques in an initial implementation of the proposed framework.

Keywords

Speaker verification Deep neural network i-vector