Tackling redundancy in text summarization through different levels of language analysis

Article ID	Journal	Published Year	Pages	File Type
453496	Computer Standards & Interfaces	2013	12 Pages	PDF

Abstract

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

► The problem of redundancy in text summarization is analyzed from three perspectives. ► The best use of exploiting redundancy in a text summarization system is analyzed. ► Semantic-based methods detect up to 90% of redundant data, being the best ones. ► Lexical-based approaches only detect 19% of redundant information.

Keywords

Redundancy detection Text Summarization Information access Natural Language Processing