Article ID Journal Published Year Pages File Type
1869542 Physics Procedia 2012 10 Pages PDF
Abstract

Similarity measurement of XML documents is crucial to meet various needs of approximate searches and document classifications in XML-oriented applications. Some methods have been proposed for this purpose. Nevertheless, few methods can be elegantly exploited to depict structure and semantic information and hence to effectively measure the similarity of XML documents. In this paper, we present a new method of computing the structure and semantic similarity of XML documents based on extended adjacency matrix(EAM). Different from a general adjacency matrix, in an EAM, the structure information of not only the adjacent layers but also the ancestor-descendant layers can be stored. For measuring the similarity of two XML documents, the proposed method firstly stores the structure and semantic information in two extended adjacency matrices(M1, M2). Then it computes similarity of the two documents through cos(M1, M2) Experimental results on bench-mark data show that the method holds high efficiency and accuracy.

Related Topics
Physical Sciences and Engineering Physics and Astronomy Physics and Astronomy (General)