کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4966511 1365125 2017 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Wikipedia-based information content and semantic similarity computation
ترجمه فارسی عنوان
محتوا اطلاعاتی مبتنی بر ویکی پدیا و محاسبات شباهت معنایی
کلمات کلیدی
محتوای اطلاعاتی شباهت معنایی، شباهت مفهومی، ویکیپدیا، ساختار رده،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی
The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 53, Issue 1, January 2017, Pages 248-265
نویسندگان
, , , ,