کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
393751 665684 2011 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Topic identification based on document coherence and spectral analysis
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Topic identification based on document coherence and spectral analysis
چکیده انگلیسی

In a world with vast information overload, well-optimized retrieval of relevant information has become increasingly important. Dividing large, multiple topic spanning documents into sets of coherent subdocuments facilitates the information retrieval process. This paper presents a novel technique to automatically subdivide a textual document into consistent components based on a coherence quantification function. This function is based on stem or term chains linking document entities, such as sentences or paragraphs, based on the reoccurrences of stems or terms. Applying this function on a document results in a coherence graph of the document linking its entities. Spectral graph partitioning techniques are used to divide this coherence graph into a number of subdocuments. A novel technique is introduced to obtain the most suitable number of subdocuments. These subdocuments are an aggregation of (not necessarily adjacent) entities. Performance tests are conducted in test environments based on standardized datasets to prove the algorithm’s capabilities. The relevance of these techniques for information retrieval and text mining is discussed.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 181, Issue 18, 15 September 2011, Pages 3783–3797
نویسندگان
, , , , ,