کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4951352 1441243 2016 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
CAS-based information retrieval in semi-structured documents: CASISS model
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
CAS-based information retrieval in semi-structured documents: CASISS model
چکیده انگلیسی


- The precision of the proposed approach is 100%.
- The semantic of words is used in the calculation of the similarity by using neighboring.
- Good performances supplied by the interference wave.
- The execution time is acceptable.

This paper aims to address the assessment the similarity between documents or pieces of documents. For this purpose we have developed CASISS (CAlculation of SImilarity of Semi-Structured documents) method to quantify how two given texts are similar. The method can be employed in wide area of applications including content reuse detection which is a hot and challenging topic. It can be also used to increase the accuracy of the information retrieval process by taking into account not only the presence of query terms in the given document (Content Only search - CO) but also the topology (position continuity) of these terms (based on Content And Structure Search - CAS). Tracking the origin of the information in social media, copy right management, plagiarism detection, social media mining and monitoring, digital forensic are among other applications require tools such as CASISS to measure, with a high accuracy, the content overlap between two documents.CASISS identify elements of semi-structured documents using elements descriptors. Each semi-structured document is pre-processed before the extraction of a set of elements descriptors, which characterize the content of the elements.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Innovation in Digital Ecosystems - Volume 3, Issue 2, December 2016, Pages 155-162
نویسندگان
, ,