کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
442235 692079 2007 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Normalized compression distance for visual analysis of document collections
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر گرافیک کامپیوتری و طراحی به کمک کامپیوتر
پیش نمایش صفحه اول مقاله
Normalized compression distance for visual analysis of document collections
چکیده انگلیسی

In a world flooded by text of various sources, it is of strategic importance to find ways to map information present in written documents in a form that helps users locate and associate important information within a particular text data set. Content-based maps can support extremely useful explorations of text data sets. This paper proposes and evaluates the use of Kolmogorov complexity approximations as a means to detect similarity between general textual documents, in order to support mapping and visualization techniques for corpora exploration. The calculation of this similarity measure requires no intermediate representation of a corpus (such as vector representation) and therefore no pre-processing or parametrization steps. That makes it very attractive for a wider range of exploratory applications compared to conventional measures that need vector-based text representations. The visual layout used here is based on fast distance multi-dimensional projections. It is shown that the similarity measure and the resulting maps present very good precision and that the approach can be used successfully for visual analysis of automatically generated text maps.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Graphics - Volume 31, Issue 3, June 2007, Pages 327–337
نویسندگان
, , ,