کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396171 666301 2007 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A novel document similarity measure based on earth mover’s distance
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A novel document similarity measure based on earth mover’s distance
چکیده انگلیسی

In this paper we propose a novel measure based on the earth mover’s distance (EMD) to evaluate document similarity by allowing many-to-many matching between subtopics. First, each document is decomposed into a set of subtopics, and then the EMD is employed to evaluate the similarity between two sets of subtopics for two documents by solving the transportation problem. The proposed measure is an improvement of the previous OM-based measure, which allows only one-to-one matching between subtopics. Experiments have been performed on the TDT3 dataset to evaluate existing similarity measures and the results show that the EMD-based measure outperforms the optimal matching (OM) based measure and all other measures. In addition to the TextTiling algorithm, the sentence clustering algorithm is adopted for document decomposition, and the experimental results show that the proposed EMD-based measure does not rely on the document decomposition algorithm and thus it is more robust than the OM-based measure.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 177, Issue 18, 15 September 2007, Pages 3718–3730
نویسندگان
,