A two-step approach to describing web topics via probable keywords and prototype images from background-removed similarities

Article ID	Journal	Published Year	Pages	File Type
6864874	Neurocomputing	2018	13 Pages	PDF

Abstract

To quickly grasp what interesting topics are happening on web, it is challenge to discover and describe topics from User-Generated Content (UGC) data. Describing topics by probable keywords and prototype images is an efficient human-machine interaction to help person quickly grasp a topic. However, except for the challenges from web topic detection, mining the multi-media description is a challenge task that the conventional approaches can barely handle: (1) noises from non-informative short texts or images due to less-constrained UGC; and (2) even for these informative images, the gaps between visual concepts and social ones. This paper addresses above challenges from the perspective of background similarity remove, and proposes a two-step approach to mining the multi-media description from noisy data. First, we utilize a devcovolution model to strip the similarities among non-informative words/images during web topic detection. Second, the background-removed similarities are reconstructed to identify the probable keywords and prototype images during topic description. By removing background similarities, we can generate coherent and informative multi-media description for a topic. Experiments show that the proposed method produces a high quality description on two public datasets.

Keywords

Topic detection User-generated content