Article ID Journal Published Year Pages File Type
558462 Computer Speech & Language 2006 28 Pages PDF
Abstract

This paper summarizes the collaboration of the LIA and CLIPS laboratories on speaker diarization of broadcast news during the spring NIST Rich Transcription 2003 evaluation campaign (NIST-RT’03S). The speaker diarization task consists of segmenting a conversation into homogeneous segments which are then grouped into speaker classes.Two approaches are described and compared for speaker diarization. The first one relies on a classical two-step speaker diarization strategy based on a detection of speaker turns followed by a clustering process, while the second one uses an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. These two methods are used to investigate various strategies for the fusion of diarization results.Furthermore, segmentation into acoustic macro-classes is proposed and evaluated as a priori step to speaker diarization. The objective is to take advantage of the a priori acoustic information in the diarization process. Along with enriching the resulting segmentation with information about speaker gender, channel quality or background sound, this approach brings gains in speaker diarization performance thanks to the diversity of acoustic conditions found in broadcast news.The last part of this paper describes some ongoing works carried out by the CLIPS and LIA laboratories and presents some results obtained since 2002 on speaker diarization for various corpora.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , , ,