|کد مقاله||کد نشریه||سال انتشار||مقاله انگلیسی||ترجمه فارسی||نسخه تمام متن|
|381963||660712||2016||14 صفحه PDF||سفارش دهید||دانلود رایگان|
• First application of Ant Colony Metaheuristic to verify information extracted by web wrappers.
• New multilevel verification system that improves the results achieved by current techniques.
• Enumeration of current techniques weakness.
• Reformulation of wrapper verification problem as a combinational optimization problem.
• Applying non-parametric testing techniques to ascertain the statistical significance among results.
Wrappers are pieces of software used to extract data from websites and structure them for further application processing. Unfortunately, websites are continuously evolving and structural changes happen with no forewarning, which usually results in wrappers working incorrectly. Thus, wrappers maintenance is necessary for detecting whether wrapper is extracting erroneous data. The solution consists of using verification models to detect whether wrapper output is statistically similar to the output produced by the wrapper itself when it was successfully invoked in the past. Current proposals present some weaknesses, as the data used to build these models are supposed to be homogeneous or that the features of this data set can be mapped to an n-dimensional space of independent dimensions when there is a correlation among their features. In this paper, a new verification system based on the Best-Worst Ant System (BWAS) is presented to overcome previous weaknesses. The experimental results show an accuracy improvement of 7.5% over current solutions.
Journal: Expert Systems with Applications - Volume 57, 15 September 2016, Pages 62–75