کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
533940 870192 2016 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Assisted keyword indexing for lecture videos using unsupervised keyword spotting
ترجمه فارسی عنوان
نمایه سازی کلمات کلیدی برای ویدیوهای سخنرانی با استفاده از نظارت بر کلمات کلیدی بدون نظارت
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• Created a completely unsupervised within-speaker keyword spotting system to create accessible index.
• Average Precision at 10 of 71.5% and 79.5% for laptop recorded and in-lecture queries for RIT lectures.
• Whitening is used to reduce variance in MFCC feature vectors (performance increase of 58%).
• In Table 1 and the accompanying text we explicitly define our criteria for defining ‘valid’ search hits.
• MIT lectures recorded on lapel microphone have the average Precision at 10 of 89.5%.

Many students use videos to supplement learning outside the classroom. This is particularly important for students with challenged visual capacities, for whom seeing the board during lecture is difficult. For these students, we believe that recording the lectures they attend and providing effective video indexing and search tools will make it easier for them to learn course subject matter at their own pace. As a first step in this direction, we seek to help instructors create an index for their lecture videos using audio keyword search, with queries recorded by the instructor on their laptop and/or created from video excerpts. For this we have created an unsupervised within-speaker keyword spotting system. We represent audio data using de-noised, whitened and scale-normalized Mel Frequency Cepstral Coefficient (MFCC) features, and locate queries using Segmental Dynamic Time Warping (SDTW) of feature sequences. Our system is evaluated using introductory Linear Algebra lectures from instructors with different accents at two U.S. universities. For lectures produced using a video camera at RIT, laptop-recorded queries obtain an average Precision at 10 of 71.5%, while 79.5% is obtained for within-lecture queries. For lectures recorded using a lapel microphone at MIT, using a similar keyword set we obtain a much higher average Precision at 10 of 89.5%. Our results suggest that our system is robust to changes in environment, speaker and recording setup.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 71, 1 February 2016, Pages 8–15
نویسندگان
, , , , , ,