Article ID Journal Published Year Pages File Type
396828 Information Systems 2015 16 Pages PDF
Abstract

•A formal geometric framework for data fusion in information retrieval is proposed.•Score-based data fusion methods such as CombSum and the linear combination method can be well-understood and better used.•We demonstrate that the conclusions are applicable to both Euclidean distance and ranking-based metrics.

Data fusion in information retrieval has been investigated by many researchers and a number of data fusion methods have been proposed. However, problems such as why data fusion can increase effectiveness and favorable conditions for the use of data fusion methods are poorly resolved at best. In this paper, we formally describe data fusion under a geometric framework, in which each component result returned from an information retrieval system for a given query is represented as a point in a multi-dimensional space. The Euclidean distance is the measure by which the effectiveness and similarity of search results are judged. This allows us to explain all component results and fused results using geometrical principles. In such a framework, score-based data fusion becomes a deterministic problem. Several interesting features of the centroid-based data fusion method and the linear combination method are discussed. Nevertheless, in retrieval evaluation, ranking-based measures are the most popular. Therefore, this paper investigates the relation and correlation between the Euclidean distance and several typical ranking-based measures. We indeed find that a very strong correlation exists between these. It means that the theorems and observations obtained using the Euclidean distance remain valid when ranking-based measures are used. The proposed framework enables us to have a better understanding of score-based data fusion and use score-based data fusion methods more precisely and effectively in various ways.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,