Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6861353 | Knowledge-Based Systems | 2018 | 20 Pages |
Abstract
In the big data environment, iterative computing is widely used in many applications such as data mining, machine learning, graph analysis and so on. Many iterative computing models are proposed to support the execution of iterative algorithms on big data efficiently. However, it is inefficient if the entire dataset has to be re-iterated when it is partly changed, for example, some data is included or excluded. This paper presents Rim, a Reusable Iterative computing Model which calculates the new iterative results with the updated dataset and the original iterative results, avoiding re-iteration on entire dataset. We propose the application conditions of Rim, and mathematically prove the accuracy and performance advantages of Rim, and describe Rim's application on three typical iterative algorithms, which are PageRank, K-means and Descendant-query. Finally, we implement Rim in Spark, and evaluate its performance on different test cases and iterative algorithms. In term of PageRank, K-Means and Descendant-query, experiments show our approach is on average 1.34Ã, 2.51Ã, 3.17Ã faster than re-iteration on massive dataset, respectively.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Jie SONG, Zhongyi MA, Yichuan ZHANG, Tiantian LI, Ge YU,