PARS TIME DOCUMENTS & QUERY DATASET

Smiley face

ParsTime is a rule-based system that extracts and normalizes Persian temporal expressions according to the TIMEX3 annotation standard.
To test our proposed tool, we developed two datasets:

 

  1. Our first dataset is 1000 queries. These queries were selected by 4 students from Parsijoo query log records. The dataset is available here. For each query we have an Id, the query itself, its issue time, and the temporal expressions in query. If there are more than one temporal expression in a query we separate them using "$$" sign.
  2. There are two folders. In Documents folder, there are xml files of each news. Each xml file has 3 tags:

    1. DocId, which shows the news document Id.
    2. Date, which shows the document publish date in Jalali calendar.
    3. Text, which is the document itself.
    In Temporal expressions folder, each file contains its related news document extracted temporal expressions in TIMEX3 annotation.The dataset is available here.

   Our source code is also available at GitHub. Our paper "ParsTime: Rule-Based Extraction and Normalization of Persian Temporal Expressions is published in ECIR 2018. For further information you may contact the authors.