کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382888 660796 2014 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A multi-document summarization system based on statistics and linguistic treatment
ترجمه فارسی عنوان
سیستم خلاصه چند سند بر اساس آمار و درمان زبانی
کلمات کلیدی
خلاصه سازی چند سند، خلاصه ی استخراج، خوشه بندی اعلان
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• The paper proposes a multi-document summarization system that uses statistical and linguistic.
• We create a new sentence clustering algorithm to deal with the redundancy and information diversity problems.
• We introduce a new graph model to explore sentence syntactic, semantic, co-reference and discourse relations.

The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 41, Issue 13, 1 October 2014, Pages 5780–5787
نویسندگان
, , , , , , ,