ACE Time Normalization (TERN) 2004 English Training Data v 1.0

Item Name: ACE Time Normalization (TERN) 2004 English Training Data v 1.0
Author(s): Lisa Ferro, Laurie Gerber, Janet Hitzeman, Elizabeth Lima, Beth Sundheim
LDC Catalog No.: LDC2005T07
ISBN: 1-58563-331-3
ISLRN: 357-991-519-054-6
Release Date: February 15, 2005
Member Year(s): 2005
DCMI Type(s): Text
Data Source(s): newswire
Project(s): TIDES, GALE, ACE
Application(s): summarization, question-answering, information extraction, temporal analysis, automatic content extraction
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2005T07 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Ferro, Lisa, et al. ACE Time Normalization (TERN) 2004 English Training Data v 1.0 LDC2005T07. Web Download. Philadelphia: Linguistic Data Consortium, 2005.

Introduction

This file contains documentation on the ACE Time Normalization (TERN) 2004 English Training Data v 1.0, Linguistic Data Consortium (LDC) catalog number LDC2005T07 and ISBN 1-58563-331-3.

This release contains the English training data prepared for the 2004 Time Expression Recognition and Normalization (TERN) Evaluation, sponsored by the Automatic Content Extraction (ACE) program. The evaluation was held in August 2004 and a workshop in September 2004. Evaluation participants received this data for training purposes, and it is now being released for general use.

The annotation specifications for this corpus were developed under DARPA's Translingual Information Detection Extraction and Summarization (TIDES) program, with continuing support from ACE.

The purpose of this corpus and the TERN evaluation is to advance the state of the art in the automatic recognition and normalization of natural language temporal expressions. In most language contexts such expressions are indexical. For example, with "Monday," "last week," or "three months starting October 1," one must know the narrative reference time in order to pinpoint the time interval being conveyed by the expression. In addition, for data exchange purposes, it is essential that the identified interval be rendered according to an established standard, i.e., normalized. Accurate identification and normalization of temporal expressions is in turn essential for the temporal reasoning being demanded by advanced NLP applications such as question answering, information extraction, and summarization.

Samples

Please examine this sample to see an example of the corpus.

Updates

Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2005T07.

"The World" is a co-production of Public Radio International and the British Broadcasting Corporation and is produced at WGBH Boston.

Available Media

View Fees





Login for the applicable fee