Article ID Journal Published Year Pages File Type
515386 Information Processing & Management 2015 14 Pages PDF
Abstract

•A novel multi-document summarization system that employs the tag cluster on Flickr.•The FoDoSu detects meaningful words by exploiting tag cluster for summarizing multi-documents.•We demonstrate the superiority of FoDoSu through experiments on TAC2008 and TAC2009.

Multi-document summarization techniques aim to reduce documents into a small set of words or paragraphs that convey the main meaning of the original document. Many approaches to multi-document summarization have used probability-based methods and machine learning techniques to simultaneously summarize multiple documents sharing a common topic. However, these techniques fail to semantically analyze proper nouns and newly-coined words because most depend on an out-of-date dictionary or thesaurus. To overcome these drawbacks, we propose a novel multi-document summarization system called FoDoSu, or Folksonomy-based Multi-Document Summarization, that employs the tag clusters used by Flickr, a Folksonomy system, for detecting key sentences from multiple documents. We first create a word frequency table for analyzing the semantics and contributions of words using the HITS algorithm. Then, by exploiting tag clusters, we analyze the semantic relationships between words in the word frequency table. Finally, we create a summary of multiple documents by analyzing the importance of each word and its semantic relatedness to others. Experimental results from the TAC 2008 and 2009 data sets demonstrate the improvement of our proposed framework over existing summarization systems.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,