irBlogs


irBlogs is a standard Persian weblogs collection that is suitable for studying Persian social networks and evaluation of graph mining and blog retrieval algorithms. Some characteristics of the collection are:

  • It contains many Persian weblogs including their posts, links, metadata, etc that are stored in XML format and weblog relations graph in standard CSV format
  • It is prepared in such a way to be a good representative of Iranian weblogs.
  • A good test bed for evaluation of weblog retrieval algorithms. It includes enough Queries and relevance judgments for a reliable evaluation.
  • It is not very large, so that it does not require high processing resources.

irBlogs Collection

irBlogs contains 5,000,000+ posts and a relations graph belonging to more than 600,000 Persian weblogs. It can be used in different applications like information retrieval, studying the Persian language in online social networks and even graph theory algorithms. Also, 45 queries and their relevance judgments are created by different users by use of UTIRE evaluation system. Different weblog retrieval algorithms are employed to create the judgment pool and totally 24339 weblogs are judged by the users (on average 540 weblogs for each query).

Download

you can download the whole irBlog collection by clicking Here – 2.9 GB (Google Drive Link Also provided) – to obtain the password contact : dbrg@ut.ac.ir

Copyright

irBlogs is created by crawling of Iranian weblogs. All rights of the collection and the tools of the collection are reserved for Database Research Group of the University of Tehran. If you use this collection, please use [۱] to refer to the collection.

Citation

https://www.sciencedirect.com/science/article/abs/pii/S0747563215302533?via%3Dihub

if you have any problem accessing the Dataset, contact us at dbrg@ut.ac.ir