کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
424915 685654 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Effective and efficient similarity search in scientific workflow repositories
ترجمه فارسی عنوان
جستجو شباهت موثر و کارآمد در مخازن جریان کاری علمی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• We present a complete system for efficient scientific workflow similarity search.
• Workflow indexing integrates with repositories’ existing search—no graphs needed.
• Layer Decomposition reranking of candidate workflows ensures high result quality.
• We evaluate on a large corpus of workflows with similarity ratings by human experts.
• Our system greatly improves previous results in speed and retains superior quality.

Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate workflow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing, searching, and ranking of search results. Yet, the graph structure of scientific workflows poses severe challenges to each of these steps. Here, we present a complete system for effective and efficient similarity search in scientific workflow repositories, based on the Layer Decomposition approach to scientific workflow comparison. Layer Decomposition specifically accounts for the directed dataflow underlying scientific workflows and, compared to other state-of-the-art methods, delivers best results for similarity search at comparably low runtimes. Stacking Layer Decomposition with even faster, structure-agnostic approaches allows us to use proven, off-the-shelf tools for workflow indexing to further reduce runtimes and scale similarity search to sizes of current repositories.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 56, March 2016, Pages 584–594
نویسندگان
, , , , ,