Article ID Journal Published Year Pages File Type
453496 Computer Standards & Interfaces 2013 12 Pages PDF
Abstract

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

► The problem of redundancy in text summarization is analyzed from three perspectives. ► The best use of exploiting redundancy in a text summarization system is analyzed. ► Semantic-based methods detect up to 90% of redundant data, being the best ones. ► Lexical-based approaches only detect 19% of redundant information.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,