Article ID Journal Published Year Pages File Type
10327326 Big Data Research 2015 18 Pages PDF
Abstract
During periods of high volume, big data stream applications may not have enough resources to process all incoming tuples. To maximize the production of the most critical results under such resource shortages, a recent solution, PR (short for Preferential Result), utilizes both static criteria (defined at compile-time) and dynamic criteria (identified online at run-time) to prioritize the processing of tuples throughout the query pipeline. Unfortunately, locating the optimal criteria placement (i.e., where in the query pipeline to evaluate each prioritization criteria) is extremely compute-intensive and runs in exponential time. This makes PR impractical for complex big data stream systems. Our proposed criteria selection and placement approach, PR-Prune (short for Preferential Result-Pruning), is practical. PR-Prune prunes ineffective dynamic criteria and combines multiple criteria along the same pipeline. To achieve this, PR-Prune seeks to expand the duration in the query pipeline that tuples identified as critical are pulled forward. Our experiments use a real data stream from the S&P 500 stocks, synthetic data streams, and a diverse set of queries. The results substantiate that PR-Prune increases the production of the most critical results compared to the state-of-the-art approaches. In addition, PR-Prune significantly lowers the optimization search time compared to PR.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,