Article ID Journal Published Year Pages File Type
6951533 Computer Speech & Language 2017 19 Pages PDF
Abstract
This paper presents a new domain compensation framework by using phonetically discriminative features which are extracted from domain-dependent deep neural networks (DNNs). The domain compensation can be applied in both unsupervised and supervised manner, depending on whether the domain information of the development data is provided or not in advance. In supervised manner, the DNNs are trained on the development speech recordings of each given domain separately. While in the unsupervised manner, the development datasets are first automatically clustered into different domains, by using the Gaussian Mixture Model mean supervectors which are generated from each of the speech recordings, DNNs are then trained on the resulting clusters. Finally, we compensate the domain variabilities during the target speaker modeling step using support vector machines, by feeding in statistical vectors which are derived from the discriminative features extracted from the domain-dependent DNNs. The main strength of our proposed framework is that it does not need any speaker labels in the development dataset, which makes the proposed framework of great advantage over the state-of-the-art techniques that need speaker labels to train inter-speaker and/or intra-speaker variability models or channel compensation. Three speaker verification systems are investigated to examine the effectiveness of this new framework. Experimental results on the NIST SRE 2010 task demonstrate competitive performances to the state-of-the-art techniques in an initial implementation of the proposed framework.
Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,