کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942986 1437616 2017 69 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds
ترجمه فارسی عنوان
چقدر می توانیم خلاصه ای از متن استخراج کنیم؟ روش های اکتشافی برای دستیابی به مرزهای بالایی
کلمات کلیدی
خلاصه متن استخراج، ساخت بدنه بالا، ساخت عصاره ایده آل، ارزیابی خلاصه،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Extractive text summarization is an effective way to automatically reduce a text to a summary by selecting a subset of the text. The performance of a summarization system is usually evaluated by comparing with human-constructed extractive summaries that are created in annotated text datasets. However, for datasets where an abstract is written for reader purpose, the performance of a summarization system is evaluated by comparing with an abstract that is created by human who uses his own words. This makes it difficult to determine how far the state-of-the-art extractive methods are away from the upper bound that an ideal extractive method might achieve. In addition, the performance of an extractive method is always different in each domain, which make it difficult to benchmark. Previous studies construct an ideal sentence-based extract of a document that provides the best score of a given metric by exhaustive search of all possible sentence combinations of a given length. They then use the performance of the extract as the sentence-based upper-bound. However, this only applies to short texts. For long texts and multiple documents, previous studies rely on manual effort, which is expensive and time consuming. In this paper, we propose nine fast heuristic methods to generate the near ideal sentence-based extracts for long texts and multiple documents. Furthermore, we propose an n-gram construction method to construct the word-based upper-bound. A percentage ranking method is used to benchmark different extractive methods across different corpora. In the experiments, five different corpora are used. The results show that the near upper bounds constructed by the proposed methods are close to that using exhaustive search, but the proposed methods are much faster. Six general extractive summarization methods were also assessed to demonstrate the difference between the performance of the methods and the near upper bounds.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 90, 30 December 2017, Pages 439-463
نویسندگان
, , , ,