Article ID Journal Published Year Pages File Type
514961 Information Processing & Management 2015 18 Pages PDF
Abstract

•Statistical approach for estimating the focus time of text documents.•Classification framework for categorizing documents into temporal and atemporal.•Bi-Temporal Document Representation using document focus time and creation time.

Time is an important aspect of text documents. While some documents are atemporal, many have strong temporal characteristics and contain contents related to time. Such documents can be mapped to their corresponding time periods. In this paper, we propose estimating the focus time of documents which is defined as the time period to which document’s content refers and which is considered complementary dimension to the document’s creation time. We propose several estimators of focus time by utilizing statistical knowledge from external resources such as news article collections. The advantage of our approach is that document focus time can be estimated even for documents that do not contain any temporal expressions or contain only few of them. We evaluate the effectiveness of our methods on the diverse datasets of documents about historical events related to 5 countries. Our approach achieves average error of less than 21 years on collections of Wikipedia pages, extracts from history-related books and web pages, while using the total time frame of 113 years. We also demonstrate an example classification method to distinguish temporal from atemporal documents.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,