کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
495300 862822 2015 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Boosting paraphrase detection through textual similarity metrics with abductive networks
ترجمه فارسی عنوان
افزایش تشخیص پارافراس از طریق معیارهای شباهت متنی با شبکه های مجرمانه
کلمات کلیدی
سرقت ادبی، تشخیص استفاده مجدد متن، تشخیص پارافرزی، معیارهای تشابه متنی، فیوژن نمره، شبکه های رهایی دهنده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Analyze a set of weak text reuse similarity metrics for paraphrase detection.
• Boost the performance of individual metrics using the abductive learning paradigm.
• Use decision-level fusion to build a committee of models of individual metrics.
• Use feature-level fusion to get a paraphrase detector using optimal set of metrics.
• Validate merits of the approach over individual metrics and other learning methods.

A number of metrics have been proposed in the literature to measure text re-use between pairs of sentences or short passages. These individual metrics fail to reliably detect paraphrasing or semantic equivalence between sentences, due to the subjectivity and complexity of the task, even for human beings. This paper analyzes a set of five simple but weak lexical metrics for measuring textual similarity and presents a novel paraphrase detector with improved accuracy based on abductive machine learning. The objective here is 2-fold. First, the performance of each individual metric is boosted through the abductive learning paradigm. Second, we investigate the use of decision-level and feature-level information fusion via abductive networks to obtain a more reliable composite metric for additional performance enhancement. Several experiments were conducted using two benchmark corpora and the optimal abductive models were compared with other approaches. Results demonstrate that applying abductive learning has significantly improved the results of individual metrics and further improvement was achieved through fusion. Moreover, building simple models of polynomial functional elements that identify and integrate the smallest subset of relevant metrics yielded better results than those obtained from the support vector machine classifiers utilizing the same datasets and considered metrics. The results were also comparable to the best result reported in the literature even with larger number of more powerful features and/or using more computationally intensive techniques.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 26, January 2015, Pages 444–453
نویسندگان
, , , ,