Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
557912	874813	2012	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Confidence measures - اقدامات اعتماد Lexical cohesion - انسجام لزومی Topic segmentation - تقسیم بندی موضوع Semantic relations - روابط معنایی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

چکیده انگلیسی

Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion based on generalized probabilities with a unigram language model. On the one hand, confidence measures and semantic relations are considered as additional sources of information. On the other hand, language model interpolation techniques are investigated for better language model estimation. Experimental topic segmentation results are presented on two corpora with distinct characteristics, composed respectively of broadcast news and reports on current affairs. Significant improvements are obtained on both corpora, demonstrating the effectiveness of the extended lexical cohesion measure for spoken TV contents, as well as its genericity over different programs.

► Adaptation of lexical cohesion based topic segmentation to TV programs specifics.
► Confidence measures and semantic relations used as additional information.
► Language model interpolation techniques used for better language model estimation.
► Domain independent technique applied on two corpora composed of TV news and reports.
► F1-measure improved by +4.9 and +3.7 for both corpus.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 26, Issue 2, April 2012, Pages 90–104

نویسندگان

Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

دسترسی سریع

ارتباط

English Website