Article ID Journal Published Year Pages File Type
382888 Expert Systems with Applications 2014 8 Pages PDF
Abstract

•The paper proposes a multi-document summarization system that uses statistical and linguistic.•We create a new sentence clustering algorithm to deal with the redundancy and information diversity problems.•We introduce a new graph model to explore sentence syntactic, semantic, co-reference and discourse relations.

The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , , , ,