Article ID Journal Published Year Pages File Type
425141 Future Generation Computer Systems 2011 8 Pages PDF
Abstract

Information retrieval algorithms demand datasets to assess their effectiveness. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This work presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable web documents for their datasets. These documents are automatically captured by a crawler and evaluated on information derived from their metadata.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , ,