کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4968998 1449849 2017 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Exploiting scene maps and spatial relationships in quasi-static scenes for video face clustering*
ترجمه فارسی عنوان
بهره برداری از نقشه های صحرایی و روابط فضایی در صحنه های شبه استاتیک برای خوشه بندی فیس بوک *
کلمات کلیدی
حاشیه نویسی صورت ویدئو، خوشه چهره، دوباره شناسایی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


- We propose a video face clustering algorithm for quasi-static scene (QSS) videos.
- QSS videos include e.g., talk shows, TV debates, TV games and symphonic concerts.
- A map of the scene and the spatial relationships between people are inferred.
- By using them, we match faces in crowded shots with lack of visual detail.
- We show that spatial information is the missing piece for effective face clustering.

Video face clustering is a fundamental step in automatically annotating a video in terms of when and where (i.e., in which video shot and where in a video frame) a given person is visible. State-of-the-art face clustering solutions typically rely on the information derived from visual appearances of the face images. This is challenging because of a high degree of variation in these visual appearances due to factors like scale, viewpoint, head pose and facial expression. As a result, either the generated face clusters are not sufficiently pure, or their number is much higher than that of people appearing in the video. A possible way towards improved clustering performance is to analyze visual appearances of faces in specific contexts and take the contextual information into account when designing the clustering algorithm. In this paper, we focus on the context of quasi-static scenes, in which we can assume that the people's positions in a scene are (quasi-)stationary. We present a novel video clustering algorithm that exploits this property to match faces and efficiently propagate face labels across the scope of viewpoints, scale and level of zoom characterizing different frames and shots of a video. We also present a novel publicly available dataset of manually annotated quasi-static scene videos. Experimental assessment on the latter indicates that exploiting information derived by the scene and the spatial relationships between people can substantially improve the clustering performance compared to the state-of-the-art in the field.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Image and Vision Computing - Volume 57, January 2017, Pages 25-43
نویسندگان
, , ,