XML schema clustering with semantic and hierarchical similarity measures

Article ID	Journal	Published Year	Pages	File Type
403169	Knowledge-Based Systems	2007	14 Pages	PDF

Abstract

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis.

Keywords

XML Schema matching Clustering Semi-structured data Data mining Structural similarity Semantic similarity