کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425880 685948 2014 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Semantic-based Structural and Content indexing for the efficient retrieval of queries over large XML data repositories
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Semantic-based Structural and Content indexing for the efficient retrieval of queries over large XML data repositories
چکیده انگلیسی


• We exploit the semantics of XML Schema and data in building our index.
• We trim search space using an object-based intersection technique of large data.
• We eliminate irrelevant portions of data by discarding and irrelevant objects.
• We prove the efficiency based on measuring CPU Cost and the scalability.
• We show the high precision and recall of query results.

The emergence of XML adoption as semi-structured data representation in multi-disciplinary domains has highlighted the need to support the optimization of complex data retrieval processing. In a Big Data environment, the need to speed up data retrieval processes has further grown significantly. In this paper, we have adopted an optimization approach that takes into consideration the semantics of the dataset in order to deal with the complexity of multi-disciplinary domains in Big Data, in particular when the data is represented as XML documents. Our method particularly addresses a twig XML query (or a branched path query), as it is one of the most costly query tasks due to the complexity of the join operation between multiple paths. Our work focuses on optimizing the structural and the content part of XML queries by presenting a method for indexing and processing XML data based on the concept of objects that is formed from the semantic connectivity between XML data nodes. Our method performs object-based data partitioning, which aims at leveraging the notion of frequently-accessed data subsets and putting these subsets together into adjacent partitions. Then, it evaluates branched queries through two essential components: (i) Structural and Content indexing, which use an object-based connection to construct indices i.e. Schema Index, Data Index and Value Index; and (ii) query processing to produce the final results in optimal time. At the end of this paper, a set of experimental results for the proposed approach on a range of real and synthetic XML data, as well as a comparative study with other related work in the area, are presented to demonstrate the effectiveness of our proposed method in terms of CPU cost, matching and merging cost, scalability (size and number of branches) and total number of scanned elements. Our evaluation demonstrates the benefit of the proposed index in terms of performance speed as well as scalability which is critical in a large data repository.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 37, July 2014, Pages 212–231
نویسندگان
, , ,