کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392676 665147 2016 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SimCC: A novel method to consider both content and citations for computing similarity of scientific papers
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
SimCC: A novel method to consider both content and citations for computing similarity of scientific papers
چکیده انگلیسی

To compute the similarity of scientific papers, text-based similarity measures, link-based similarity measures, and hybrid methods can be applied. The text-based and link-based similarity measures take into account only a single aspect of scientific papers, content or citations, respectively. The hybrid methods consider both content and citations; however, they do not carefully consider the relation between the content of a pair of papers involved in a citation relationship. In this paper, we propose a novel method, SimCC (similarity based on content and citations), that considers both aspects, content and citations, to compute the similarity of scientific papers. Unlike previous methods, SimCC effectively reflects both content and authority of scientific papers simultaneously in similarity computation by applying a new RA (relevance and authority) weighting scheme. Also, we propose an RA+R weighting scheme to consider the recency of papers and an RA+E weighting scheme to take into account the author expertise of papers in similarity computation. The effectiveness of our proposed method is demonstrated by extensive experiments on a real-world dataset of scientific papers. The results show that our method achieves more than 100% improvement in accuracy in comparison with previous methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 334–335, 20 March 2016, Pages 273–292
نویسندگان
, , ,