Article ID Journal Published Year Pages File Type
403109 Knowledge-Based Systems 2009 6 Pages PDF
Abstract

The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the different nature of the Persian language compared to the other languages such as English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is the lack of a standard test collection. In this paper, we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgments are presented in this paper. We believe that this collection is the largest Persian text collection, so far.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,