کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392583 664991 2016 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A probabilistic ranking framework for web-based relational data imputation
ترجمه فارسی عنوان
یک چارچوب رتبهبندی احتمالی برای محاسبه داده های وابسته به وب مبتنی بر وب است
کلمات کلیدی
محاسبه داده های ارتباطی مبتنی بر وب، ارزشهای ویژگی گمشده، رتبهبندی احتمالاتی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Due to richness of information on web, there is an increasing interest to search for missing attribute values in relational data on web. Web-based relational data imputation has to first extract multiple candidate values from web and then rank them by their matching probabilities. However, effective candidate ranking remains challenging because web documents are unstructured and popular search engines can only provide with relevant but not necessarily semantically matching information.In this paper, we propose a novel probabilistic approach for ranking the web-retrieved candidate values. It can integrate various influence factors, e.g. snippet rank order, occurrence frequency, occurrence pattern, and keyword proximity, in a single framework by semantic reasoning. The proposed framework consists of snippet influence model and semantic matching model. The snippet influence model measures the influence of a snippet, and the semantic matching model measures the semantic similarity between a candidate value in a snippet and a missing relational value in a tuple. We also present effective probabilistic estimation solutions for both models. Finally, we empirically evaluate the performance of the proposed framework on real datasets. Our extensive experiments demonstrate that it outperforms the state-of-the-art techniques by considerable margins on imputation accuracy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 355–356, 10 August 2016, Pages 152–168
نویسندگان
, , , , ,