کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
386246 | 660881 | 2014 | 12 صفحه PDF | دانلود رایگان |
• We propose a graph-structural index called the RG-index for efficient SPARQL query processing.
• We propose an efficient building algorithm for the RG-index adapted from the gSpan algorithm.
• We present the RFLT operator, which conducts triple filtering efficiently.
• We develop the cost model and the cardinality estimation method for the RFLT operator.
• We conducted comprehensive experiments with very large-scale real-life and synthetic RDF datasets.
As the size of Resource Description Framework (RDF) graphs has grown rapidly, SPARQL query processing on the large-scale RDF graph has become a more challenging problem. For efficient SPARQL query processing, the handling of the intermediate results is the most crucial element because it generally involves many join operators. Recently, a triple filtering method, called the RP-filter, which uses a path-based index, was proposed. It can reduce the intermediate results effectively by filtering out irrelevant triples in advance. However, its filtering power is limited, because it uses only the path information of the RDF graph. In this paper, we extend the triple filtering method to exploit the graph-structural information, and propose the RDF graph index (RG-index). We address the problem of the RG-index, which is caused by the indexing of the graph patterns, by indexing only effective graph patterns for the triple filtering. In addition, we propose an efficient method for building the RG-index in which a frequent graph pattern mining algorithm is adapted. We conducted comprehensive experiments on large-scale RDF datasets and demonstrated that the RG-index can reduce redundant intermediate results more effectively than can the RP-filter.
Journal: Expert Systems with Applications - Volume 41, Issue 10, August 2014, Pages 4596–4607