Interactive deep learning method for segmenting moving objects

Article ID	Journal	Published Year	Pages	File Type
4970117	Pattern Recognition Letters	2017	13 Pages	PDF

Abstract

With the increasing number of machine learning methods used for segmenting images and analyzing videos, there has been a growing need for large datasets with pixel accurate ground truth. In this letter, we propose a highly accurate semi-automatic method for segmenting foreground moving objects pictured in surveillance videos. Given a limited number of user interventions, the goal of the method is to provide results sufficiently accurate to be used as ground truth. In this paper, we show that by manually outlining a small number of moving objects, we can get our model to learn the appearance of the background and the foreground moving objects. Since the background and foreground moving objects are highly redundant from one image to another (videos come from surveillance cameras) the model does not need a large number of examples to accurately fit the data. Our end-to-end model is based on a multi-resolution convolutional neural network (CNN) with a cascaded architecture. Tests performed on the largest publicly-available video dataset with pixel accurate groundtruth (changdetection.net) reveal that on videos from 11 categories, our approach has an average F-measure of 0.95 which is within the error margin of a human being. With our model, the amount of manual work for ground truthing a video gets reduced by a factor of up to 40. Code is made publicly available at: https://github.com/zhimingluo/MovingObjectSegmentation

Keywords

41A05 65D05 41A10 65D17 Ground truthing Motion detection Convolutional neural network