کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
393308 665633 2014 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A low redundancy strategy for keyword search in structured and semi-structured data
ترجمه فارسی عنوان
استراتژی انبساطی کم برای جستجوی کلمات کلیدی در داده های ساختار یافته و نیمه ساختار یافته
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• A new keyword-search technique is described, which solves the problem of duplicate data in a Virtual Document approach.
• A complete keyword-based search engine architecture is presented.
• A reduction in indexing time and index size when applying the Virtual Document approach in large datasets is presented.

Keyword Search has been recognised as a viable alternative for information search in semi-structured and structured data sources. Current state-of-the-art keyword-search techniques over relational databases do not take advantage of correlative meta-information included in structured and semi-structured data sources leaving relevant answers out. These techniques are also limited due to scalability, performance and precision issues that are evident when they are implemented on large datasets. Based on an in-depth analysis of issues related to indexing and ranking semi-structured and structured information. We propose a new keyword-search algorithm that takes into account the semantic information extracted from the schemes of the structured and semi-structured data sources and combine it with the textual relevance obtained by a common text retrieval approach. The algorithm is implemented in a keyword-based search engine called KESOSASD (Keyword Search Over Semi-structured and Structured Data), improving its precision and response time. Our approach models the semi-structured and structured information as graphs, and make use of a Virtual Document Structure Aware Inverted Index (VDSAII). This index is created from a set of logical structures called Virtual Documents, which capture and exploit the implicit structural relationships (semantics) depicted in the schemas of the structured and semi-structured data sources. Extensive experiments were conducted to demonstrate that KESOSASD outperforms existing approaches in terms of search efficiency and accuracy. Moreover, KESOSASD is prepared to scale out and manage large databases without degrading its effectiveness.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 288, 20 December 2014, Pages 135–152
نویسندگان
, , ,