کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565974 875886 2011 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Towards precise and robust automatic synchronization of live speech and its transcripts
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Towards precise and robust automatic synchronization of live speech and its transcripts
چکیده انگلیسی

This paper presents our efforts in automatically synchronizing spoken utterances with their transcripts (textual contents) (ASUT), where the speech is a live stream and its corresponding transcripts are known. This task is first simplified to the problem of online detecting the end times of spoken utterances and then a solution based on a novel frame-synchronous likelihood ratio test (FSLRT) procedure is proposed. We detail the formulation and implementation of the proposed FSLRT procedure under the Hidden Markov Models (HMMs) framework, and we study its property and parameter settings empirically.Because synchronization failures may occur in the FSLRT-based AUST systems, this paper also extends the FSLRT procedure to its multiple-instance version to increase the robustness of the system. The proposed multiple-instance FSLRT can detect the synchronization failures and restart the system from an appropriate point. Therefore a fully automatic FSLRT-based ASUT system could be constructed.The FSLRT-based ASUT system is evaluated in a simultaneous broadcasting news subtitling task. Experimental results show that the proposed method achieves satisfying performance and it outperforms an automatic speech recognition-based method both in terms of robustness and precision. Finally, the FSLRT-based news subtitling system can correctly subtitle about 90% of the sentences with an average time deviation of about 100 ms, running at the speed of 0.37 real time (RT).

Research highlights
► The problem of automatically synchronizing live speech with its transcripts is addressed.
► A novel frame-synchronous likelihood ratio test (FSLRT) procedure is proposed.
► The FSLRT is augmented with a multiple-instance strategy to increase its robustness.
► The proposed algorithm achieves satisfying performance in a simultaneous broadcasting news subtitling task.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 53, Issue 4, April 2011, Pages 508–523
نویسندگان
, , ,