Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
394005 | Information Sciences | 2010 | 21 Pages |
The issues of data integration and interoperability pose significant challenges in scientific hydrological and environmental studies, due largely to the inherent semantic and structural heterogeneities of massive datasets and non-uniform autonomous data sources. To address these data integration challenges, we propose a unified data integration framework, called Hydrological Integrated Data Environment (HIDE). HIDE is based on a labeled-tree data integration model referred to as DataNode tree. Using this framework, characteristics of datasets gathered from diverse data sources – with different logical and access organizations – can be extracted and classified as Time–Space–Attribute (TSA) labels and are subsequently arranged in a DataNode tree. The uniqueness of our approach is that it effectively combines the semantic aspects of the scientific domain with diverse datasets having different logical organizations to form a unified view. Further, we also adopt a metadata-based approach for specifying the TSA-DataNode tree in order to achieve flexibility and extensibility. The search engine of our HIDE prototype system evaluates a simple user query systematically on the TSA-DataNode tree, presenting integrated results in a standardized format that facilitates both effective and efficient data integration.