کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558755 1451748 2014 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A fast and scalable hybrid FA/PPCA-based framework for speaker recognition
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
A fast and scalable hybrid FA/PPCA-based framework for speaker recognition
چکیده انگلیسی


• The hybrid FA/PPCA system is presented.
• Two approaches, termed AN and AC, are proposed to speed up the i-vector estimation during testing in a speaker recognition.
• Significant speed ups are obtained for each proposed approach.
• The scalability of the hybrid system is demonstrated using two suitable features (MFCC and MFS).
• Fusion of systems that use AC-type approximation perform similar to that of the corresponding Hybrid system baseline.

A text-independent speaker recognition system using a hybrid Probabilistic Principal Component Analysis (PPCA) and conventional i-vector modeling technique is proposed. In this framework, the total variability space (TVS) is estimated using PPCA while the i-vectors of target speakers and test utterances are extracted using the conventional method. This leads to appreciable decrease in development time, while the time required for training and testing remains unchanged. In this a paper, an algorithmic optimization to the PPCA's EM algorithm is developed. This is observed to provide a speed up of 3.7×. To simplify the testing procedure, two different approximation procedures are proposed to be used in this framework. The first approximation assumes a covariance matrix computed based on the PPCA framework. The second approximation proposes an optimization to avoid inverting the precision matrix of the i-vector. The comparison of time taken by these approximations with the baseline i-vector extraction procedure shows speed gains with some deterioration in performance in terms of the Equal Error Rate (EER). Among the proposed techniques, a best case trade-off is obtained with a speed up of 81.2× with deterioration in performance by 0.7%0.7% in absolute terms. Speaker recognition performances are studied on the telephone conditions of the benchmark NIST SRE 2010 dataset with systems built on the Mel Frequency Cepstral Co-efficient (MFCC) feature. A trade-off in the performance is observed when the proposed approximations are used. The scalability of these trade-offs is tested on the Mel Filterbank Slope (MFS) feature. The trade-offs observed with the approximations are reduced when the two systems are fused.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Digital Signal Processing - Volume 32, September 2014, Pages 137–145
نویسندگان
,