Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
425141 | Future Generation Computer Systems | 2011 | 8 Pages |
Abstract
Information retrieval algorithms demand datasets to assess their effectiveness. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This work presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable web documents for their datasets. These documents are automatically captured by a crawler and evaluated on information derived from their metadata.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Ricardo Barros, José A. Rodrigues Nt., Geraldo B. Xexéo, Jano M. de Souza,