کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
454035 | 695092 | 2016 | 14 صفحه PDF | دانلود رایگان |
• We study the quality of deltas produced by diff algorithms, regardless of their speed.
• We present a formal way to compare the deltas and their properties.
• It includes a uniform delta model and a set of quantitative metrics.
• It makes it possible to compare algorithms with different aims and output formats.
• We show how the metrics capture peculiarities of some widespread XML diff algorithms.
The automatic detection of differences between documents is a very common task in several domains. This paper introduces a formal way to compare diff algorithms and to analyze the deltas they produce. There is no one-fits-all definition for the quality of a delta, because it is strongly related to the application domain and the final use of the detected changes. Researchers have historically focused on minimality: reducing the size of the produced edit scripts and/or taming the computational complexity of the algorithms. Recently they started giving more relevance to the human interpretability of the deltas, designing tools that produce more readable, usable and domain-oriented results. We propose a universal delta model and a set of metrics to characterize and compare effectively deltas produced by different algorithms, in order to highlight what are the most suitable ones for use in a given task and domain.
Journal: Computer Standards & Interfaces - Volume 46, May 2016, Pages 52–65