کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
557912 874813 2012 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation
چکیده انگلیسی

Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion based on generalized probabilities with a unigram language model. On the one hand, confidence measures and semantic relations are considered as additional sources of information. On the other hand, language model interpolation techniques are investigated for better language model estimation. Experimental topic segmentation results are presented on two corpora with distinct characteristics, composed respectively of broadcast news and reports on current affairs. Significant improvements are obtained on both corpora, demonstrating the effectiveness of the extended lexical cohesion measure for spoken TV contents, as well as its genericity over different programs.


► Adaptation of lexical cohesion based topic segmentation to TV programs specifics.
► Confidence measures and semantic relations used as additional information.
► Language model interpolation techniques used for better language model estimation.
► Domain independent technique applied on two corpora composed of TV news and reports.
► F1-measure improved by +4.9 and +3.7 for both corpus.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 26, Issue 2, April 2012, Pages 90–104
نویسندگان
, , ,