Generating web-based corpora for video transcripts categorization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
384220	660842	2013	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Automatic speech recognition (ASR)

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Generating web-based corpora for video transcripts categorization

چکیده انگلیسی

This paper proposes the use of Internet as a rich source of information in order to generate learning corpora for video transcripts categorization systems. Our main goal in this work has been to study the behavior of different learning corpora generated from the Internet and analyze some of their features. Specifically, Wikipedia, Google and the blogosphere have been employed to generate these learning corpora, using the VideoCLEF 2008 track as the evaluation framework for the different experiments carried out. Based on this evaluation framework, we conclude that the proposed approach is a promising strategy for the video classification task using the transcripts of the videos. The different sizes of the corpora generated could lead to believe that better results are achieved when the corpus size is larger, but we demonstrate that this feature may not always be a reliable indicator of the behavior of the learning corpus. The obtained results show that the integration of knowledge from the blogosphere or Google allows generating more reliable corpora for this task than those based on Wikipedia.

► Integration of knowledge from web resources like Google, Wikipedia and blogs is a promising strategy to generate learning corpora for the video categorization task.
► We demonstrate that the size of the generated corpora may not always be a reliable indicator of the behavior of the learning corpus.
► The integration of knowledge from the blogosphere or Google allows generating more reliable corpora for this task than those based on Wikipedia.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 1, January 2013, Pages 337–344

نویسندگان

José M. Perea-Ortega, Arturo Montejo-Ráez, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Generating web-based corpora for video transcripts categorization

دسترسی سریع

ارتباط

English Website