کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
567510 876095 2012 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
چکیده انگلیسی

Many state-of-the-art diarization systems for meeting recordings are based on the HMM/GMM framework and the combination of spectral (MFCC) and time delay of arrivals (TDOA) features. This paper presents an extensive study on how multistream diarization can be improved beyond these two sets of features. While several other features have been proven effective for speaker diarization, little efforts have been devoted to integrate them into the MFCC + TDOA state-of-the-art baseline and to the authors’ best knowledge, no positive results have been reported so far. The first contribution of this paper consists in analyzing the reasons of this, investigating through a set of oracle experiments the robustness of the HMM/GMM diarization when also other features (the modulation spectrum features and the frequency domain linear prediction features) are integrated. The second contribution of the paper consists in introducing a non-parametric multistream diarization method based on the information bottleneck (IB) approach. In contrary to the HMM/GMM which makes use of log-likelihood combination, it combines the feature streams in a normalized space of relevance variables. The previous analysis is repeated revealing that the proposed approach is more robust and can actually benefit from other sources of information beyond the conventional MFCC and TDOA features. Experiments based on the rich transcription data (heterogeneous meetings data recorded in several different rooms) show that it achieves a very competitive error of only 6.3% when four feature streams are used, compared to the 14.9% of the HMM/GMM system. Those results are analyzed in terms of error sensitivity to the stream weightings. To the authors’ best knowledge this is the first successful attempt to reduce the speaker error combining other features with the MFCC and the TDOA and the first study to show the shortcomings of the HMM/GMM in going beyond this baseline. As last contribution, the paper also addresses issues related to the computational complexity of multistream approaches.


► Speaker diarization of meetings makes large use of MFCC and TDOA features.
► Other features have been proven effective but never integrated in such a baseline.
► IB diarization can benefit from other feature streams beyond MFCC/TDOA baseline on NIST rich transcription data.
► HMM/GMM diarization suffers from sensitivity issues when moving beyond the MFCC/TDOA baseline.
► Even with four feature streams, IB diarization performs faster-then-real-time.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 54, Issue 1, January 2012, Pages 55–67
نویسندگان
, , ,