کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6938274 | 1449924 | 2018 | 19 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Video you only look once: Overall temporal convolutions for action recognition
ترجمه فارسی عنوان
ویدیویی که فقط یکبار نگاه میکنید: پیچشهای عمودی موقتی برای تشخیص عمل
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
درک ویدئو طبقه بندی ویدئو، تشخیص عمل، شبکه عصبی متقاطع،
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
In this paper, we propose an efficient and straightforward approach, video you only look once (VideoYOLO), to capture the overall temporal dynamics from an entire video in a single process for action recognition. It remains an open question for action recognition on how to deal with the temporal dimension in videos. Existing methods subdivide a whole video into either individual frames or short clips and consequently have to process these fractions multiple times. A post process is then used to aggregate the partial dynamic cues to implicitly infer the whole temporal information. On the contrary, in VideoYOLO, we first generate a proxy video by selecting a subset of frames to roughly reserve the overall temporal dynamics presented in the original video. A 3D convolutional neural network (3D-CNN) is employed to learn the overall temporal characteristics from the proxy video and predict action category in a single process. Our proposed method is extremely fast. VideoYOLO-32 is able to process 36 videos per second that is 10 times and 7 times faster than prior 2D-CNN (Two-stream (Simonyan and Zisserman, 2014)) and 3D-CNN (C3D (Tran et al., 2015)) based models, respectively, while still achieves superior or comparable classification accuracies on the benchmark datasets, UCF101 and HMDB51.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Visual Communication and Image Representation - Volume 52, April 2018, Pages 58-65
Journal: Journal of Visual Communication and Image Representation - Volume 52, April 2018, Pages 58-65
نویسندگان
Longlong Jing, Xiaodong Yang, Yingli Tian,