کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
567719 | 1452050 | 2007 | 11 صفحه PDF | دانلود رایگان |
Audio–visual speech source separation consists in mixing visual speech processing techniques (e.g., lip parameters tracking) with source separation methods to improve the extraction of a speech source of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation methods based on the sparseness of speech: visual information is used as a voice activity detector (VAD) which is combined with a new geometric method of separation. The proposed audio–visual method is shown to be efficient to extract a real spontaneous speech utterance in the difficult case of convolutive mixtures even if the competing sources are highly non-stationary. Typical gains of 18–20 dB in signal to interference ratios are obtained for a wide range of (2 × 2) and (3 × 3) mixtures. Moreover, the overall process is computationally quite simpler than previously proposed audio–visual separation schemes.
Journal: Speech Communication - Volume 49, Issues 7–8, July–August 2007, Pages 667–677