dotIR is a standard Persian test collection that is suitable for evaluation of web information retrieval algorithms in Iranian web. Some characteristics of the collection are:
dotIR Test Collection
dotIR contains 1,000,000 web pages that are gathered by selective crawling many websites from the .IR domain. Also, 50 queries and their relevance judgments are created by 25 users by use of UTIRE evaluation system. Different web retrieval algorithms are employed to create the judgment pool and totally 18,424 documents are judged by the users (on average 369 document for each query).
In order to ease comparison of different ranking algorithms on the collection, 56 features are calculated and added to the collection. These are standard features that are presented in the LETOR collection (provided by Microsoft Research Asia). The features can be used for training or tuning of web information retrieval algorithms
you can download the whole WebIR(dorIR) collection by clicking Here – 1.4 GB (Google Drive Link Also Provided)
dotIR collection is created by crawling of Iranian web. All rights of the collection and the tools of the collection are reserved for Database Research Group of the University of Tehran. If you use this collection, please use  to refer to the collection
if you have any problem accessing the Dataset, contact us at firstname.lastname@example.org