Article ID Journal Published Year Pages File Type
426126 Future Generation Computer Systems 2012 7 Pages PDF
Abstract

Discovering the correct dataset in an efficient fashion is critical for effective simulations in the atmospheric sciences. Unlike text-based web documents, many of the large scientific datasets often contain binary encoded data that is hard to discover using popular search engines. In the atmospheric sciences, there has been a significant growth in public data hosting services. However, the ability to index and search has been limited by the metadata provided by the data host. We have developed an infrastructure–Atmospheric Data Discovery System (ADDS)–that provides an efficient data discovery environment for observational datasets in the atmospheric sciences. To support complex querying capabilities, we automatically extract and index fine-grained metadata. Datasets are indexed based on periodic crawling of popular sites and also of files requested by the users. Users are allowed to access subsets of a large dataset through our data customization feature. Our focus is the overall architecture, data subsetting scheme, and a performance evaluation of our system.

► Improving search and access to binary datasets published by multiple hosts. ► Community driven scientific data search environment. ► Support for complex queries and rich metadata extraction from binary datasets. ► Efficient subsetting of large atmospheric observational datasets.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,