TAC KBP English Temporal Slot Filling Comprehensive 
            Training and Evaluation Data Sets 2011 and 2013
                              
         Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel

1. Overview

This package contains training and evaluation data produced in support 
of the TAC KBP English Temporal Slot Filling tasks in 2011 and 2013.

Text Analysis Conference (TAC) is a series of workshops organized by
the National Institute of Standards and Technology (NIST). TAC was
developed to encourage research in natural language processing (NLP)
and related applications by providing a large test collection, common
evaluation procedures, and a forum for researchers to share their
results. Through its various evaluations, the Knowledge Base
Population (KBP) track of TAC encourages the development of systems
that can match entities mentioned in natural texts with those
appearing in a knowledge base and extract novel information about
entities from a document collection and add it to a new or existing
knowledge base.

The Temporal Slot Filling task (TSF) seeks to build upon the technology 
developed for regular Slot Filling (SF). The regular Slot Filling task 
involves mining information about entities from text. SF can be viewed 
as more traditional Information Extraction, or alternatively, as a 
Question Answering (QA) task, in which the questions are static but the 
targets change. In completing the SF task, participating systems and LDC 
annotators searched a corpus for information on certain attributes (slots) 
of person (PER) and organization (ORG) entities and returned any valid 
responses (slot fillers) that were not redundant with those in an existing 
knowledge base (KB). The purpose of the TSF task is to identify and 
capture temporal information in text that indicates when a given relation 
between an SF query entity and filler held true. For more information 
about Temporal Slot Filling, please refer to the 2013 track home page 
(2013 was the last year in which the Temporal Slot Filling evaluation was 
conducted as of the time this package was created) at 
http://www.nist.gov/tac.

This package contains all evaluation and training data developed in 
support of TAC KBP Temporal Slot Filling during 2011 and 2013, the two
years a TSF eval was run. This includes queries, the manual runs produced
by LDC annotators, and the final rounds of assessment results for the 
Temporal Slot Filling evaluations held in 2011 and 2013. The corresponding 
source document collections for this release are included in 
LDC2018T03: TAC KBP Comprehensive English Source Corpora 2009-2014.
The corresponding Knowledge Base (KB) for much of the data - a 2008 snapshot 
of Wikipedia - can be obtained via 
LDC2014T16: TAC KBP Reference Knowledge Base.

The data included in this package were originally released by LDC
to TAC KBP coordinators and performers under the following ecorpora
catalog IDs and titles:

LDC2011E49: TAC 2011 KBP English Training Temporal Slot Filling 
            Annotation V1.1
LDC2011E85: TAC 2011 KBP English Evaluation Diagnostic Temporal 
            Slot Filling Queries V1.1
LDC2012E38: TAC 2011 KBP English Evaluation Temporal Slot Filling 
            Annotation
LDC2013E82: TAC 2013 KBP English Temporal Slot Filling Training 
            Queries and Annotations
LDC2013E86: TAC 2013 KBP English Temporal Slot Filling Evaluation 
            Queries and Annotations V1.1
LDC2013E99: TAC 2013 KBP English Temporal Slot Filling Evaluation 
            Assessment Results V1.1
LDC2015E50: TAC KBP English Temporal Slot Filling – Collected Training 
            and Evaluation Data Sets 2011 and 2013

Summaries of data included in this package (for more details see
  ./data/{2011,2013}/contents.txt):

Query Data:
+------+------------+-----+-----+-------+
| year |    set     | PER | ORG | total |
+------+------------+-----+-----+-------+
| 2011 | training   |  40 |  10 |    50 |
| 2011 | evaluation |  80 |  20 |   100 |
| 2013 | training   |   6 |   1 |     7 |
| 2013 | evaluation | 232 |  39 |   271 |
+------+------------+-----+-----+-------+

Manual Response Data:
+------+------------+-------------------+
| year |    set     |  manual responses |
+------+------------+-------------------+
| 2011 | training   |             1,258 |
| 2011 | evaluation |             1,413 |
| 2013 | training   |                16 |
| 2013 | evaluation |             1,519 |
+------+------------+-------------------+

Assessment Data (2013):
+----------+-----------+
| assessed | assessed  |
|  files   | responses |
+----------+-----------+
|      273 |     2,035 |
+----------+-----------+


2. Contents

./docs/README.txt

  This file.

./data/{2011,2013}/contents.txt

  The data in this package are organized by the year of original release
  in order to clarify dependencies, highlight occassional differences in
  formats from one year to another, and to increase readability in
  documentation. The contents.txt file within each year's root directory
  provides a list of the contents for all subdirectories as well as
  specific details about file formats and contents.

./docs/all_files.md5

  Paths (relative to the root of the corpus) and md5 checksums for all files
  included in the package.

./docs/guidelines/{2011,2013}/*.pdf

  The guidelines used by annotators in developing temporal slot filling   
  queries, manual responses, and assessment data contained in this corpus.

./docs/task_descriptions/KBP2011_TaskDefinition.pdf

  Task Description for 2011 covering all of the TAC KBP tracks, written by
  evaluation track coordinators. 
  Note that this document also describes tasks not relevant to this 
  specific package.

./docs/task_descriptions/KBP2013_TaskDefinition_EnglishSlotFilling_1.1.pdf
  
  Task description for both the 2013 English Regular and Temporal Slot 
  Filling evaluation tracks, written by track coordinators.
  
./dtd/kbpslotfill_temp2011.dtd

  The dtd against which to validate these files:
  
  ./data/2011/training/queries.xml
  ./data/2011/eval/queries.xml
  
./dtd/kbpslotfill_tempnew2011.dtd

  The dtd against which to validate this file:
  
  ./data/2011/training/new_queries.xml

./tools/scorers/KBP2013_English_TSF_slot-list.txt
  
  Temporal SF slot list file to be used with the 2013 scorer. 

./tools/scorers/SFScore2013.java

  Scorers for temporal slot filling files for 2013, as provided to LDC 
  by evaluation track coordinators, with no further testing.
  
./tools/scorers/TSFScore2.java

  Scorers for temporal slot filling files for 2011, as provided to LDC 
  by evaluation track coordinators, with no further testing.
    
./tools/validators/check_kbp_{2011,2013}_slot-filling}.pl

  Validators for temporal slot filling files for respective years, as 
  provided to LDC by evaluation track coordinators, with no further testing.


3. Annotation tasks - Query Development, Manual Run Development, 
   Slot Mapping, and Assessment

Temporal Slot Filling (TSF) builds upon annotations typically developed for 
regular Slot Filling (SF) by adding temporal data. In SF, the values of 
specified attributes (or slots) are extracted for a given entity from large 
collections of natural language texts. Examples of slots include age, 
birthplace, and spouse for a person, or founder, top members, and 
website for organizations. The TSF task grounded a subset of these 
extracted values temporally by finding dates when these slot fillers 
were valid. 

The tasks conducted by LDC annotators in support of the TSF track consisted 
of a combined query development-manual run development subtask as well as 
assessment. Each of these subtasks are explained here.

3.1 Query and Manual Run Development
TSF query development-manual run development consisted of identifying and 
capturing temporal information that indicated the period of time when a 
given relation between an SF query entity and slot filler held true. TSF 
query entities were extracted and manual run annotations made on 
temporalized slot fillers (relations that have some temporal aspect to 
them) for 13 KBP slots across sets of KBP source documents selected for 
richness of temporalized slot fillers. 

In 2011, the entity selection process for TSF utilized a different process 
than that used for regular Slot Filling, due to the sharing of temporally-
labeled data between TAC KBP and the Machine Reading program. Rather than 
first selecting identifiable entities and then annotating slot fillers and 
temporal information for those fillers, a reverse selection process was used 
in which annotation preceded entity selection. The query development-manual 
run development process began by performing keyword searches on the source 
data to identify documents containing KBP (entity-slot) relations. The 
document set which resulted from this keyword search was then subsetted with 
a high keyword frequency threshold. Next, this document subset was screened 
for the presence of temporalized KBP relations. The resulting set of 
documents were then exhaustively annotated for KBP relations and their 
associated temporal information. In a post-annotation screening process, 
annotators selected identifiable entities in the annotation pool that were 
part of at least one temporalized KBP relation; these entities then served 
as training queries for the TSF manual run.

In selecting the 2011 TSF evaluation queries, an additional screening 
process was then applied to ensure productivity of fillers and temporal 
information. Evaluation query selection relied on the same post-annotation 
screening process to select identifiable candidate entities annotated in 
at least one temporalized KBP relation. Annotators then performed a time-
limited search in the KBP source data for these candidate entities, to 
determine how frequently they occurred in temporalized KBP relations. 
Entities were then selected from the candidate evaluation query entity 
set, with preference given to entities that occurred more frequently in 
temporalized KBP relations and that occurred in a greater variety of 
temporalized KBP relations. This entity selection process produced a
set of TSF evaluation queries on which time-limited search and cross-
document annotation of temporalized slot fillers could be carried out 
during the evaluation annotation task.

The 2011 evaluation manual run was developed by combining two sets of 
annotations. The first set of annotations was a set of within-document 
annotations that were created in the initial temporal slot filling 
evaluation query selection and annotation process. The second set of 
annotations was a set of time-limited, cross-document annotations. This 
second set was created by searching for temporal slot filling information 
for the evaluation entities within a 2 hour time limit, across the entire 
KBP corpus. This process produced a larger set of temporal slot filling 
evaluation annotation data than the intra-document annotation process 
that had been used to produce the temporal slot filling training 
annotation data.

In contrast to the 2011 task, in which queries consisted of entities alone, 
each 2013 input query was a binary relation between an entity and one 
slot filler. This allowed systems to focus on the temporal aspect of the 
task and ignore the slot filling extraction component. Also, for 2013 
queries, annotators were able to select for more interesting temporal 
information, such as indicators of beginnings and endings. For each 2013 
TSF query, annotators were given up to two hours to search the corpus and 
locate all valid fillers. 

Following the initial round of query development and manual run annotation, 
a quality control pass was conducted to flag any fillers that did not have 
adequate justification in the source document, or that might be at variance 
with the current guidelines. These flagged fillers were then adjudicated by 
senior annotators.

3.2 Assessment
Assessment of TSF responses was divided into two tasks: assessment of slot 
fillers and assessment of temporal information connected to those fillers. 
The procedure used for assessing temporal slot fillers mirrored the 
process used for regular Slot Filling assessment.

After slot fillers were returned for the query set from both the human 
manual run and from systems, annotators assessed and coreferenced the 
responses. Fillers were marked as correct if they were found to be both 
compatible with the slot descriptions and supported in the provided 
justification string(s) and/or its surrounding content. Fillers were 
assessed either as wrong if they did not meet both of the conditions 
for correctness, or inexact if insufficient or if extraneous text 
had been selected for an otherwise correct response. 

Justification was assessed as correct if it succinctly and completely 
supported the relation, wrong if it did not support the relation at all 
(or if the corresponding filler was marked wrong), inexact-short if part 
but not all of the information necessary to support the relation was 
provided, or inexact-long if it contained all information necessary to 
support the relation but also a great deal of extraneous text.

After first passes of assessment were completed, quality control was 
performed on the data by senior annotators. Performing quality control 
ensured that the extent of each annotated filler and justification were 
correct, checked that entities assessed as correct were coreferenced 
in the appropriate equivalence class, and flagged potentially problematic 
assessments for additional review.

After filler assessment was complete for the temporal data set, LDC 
compared the resulting list of documents containing correct, system-
generated slot fillers with those annotated by humans during TSF. The 
purpose of this comparison was to identify all documents marked only 
by systems as containing temporal information for a given entity/slot-
filler combination. Once these documents were identified, they were 
reviewed and annotated whenever temporal information relating to the
specific entity-filler combination was present. 


4. Using the Data

As mentioned in the intro, note that the corresponding source document 
collections for this release are included in LDC2018T03: TAC KBP 
Comprehensive English Source Corpora 2009-2014. Also, the corresponding 
Knowledge Base (KB) for much of the data - a 2008 snapshot of Wikipedia 
- can be obtained via LDC2014T16: TAC KBP Reference Knowledge Base.

4.1 Text normalization and offset calculation

Text normalization of queries consisting of a 1-for-1 substitution of 
newline (0x0A) and tab (0x09) characters with space (0x20) characters 
was performed on the document text input to the response field.

The values of the beg and end XML elements in the later queries.xml files
indicate character offsets to identify text extents in the source.  Offset
counting starts from the initial opening angle bracket of the <DOC> element
(<doc> in DF sources), which is usually the initial character (character 0)
of the source. Note as well that character counting includes newlines and
all markup characters - that is, the offsets are based on treating the
source document file as "raw text", with all its markup included.

Note that although strings included in the annotation files
(queries and gold standard mentions) generally match source documents, a 
few characters are normalized in order to enhance readability: Conversion
of newlines to spaces, except where preceding characters were hyphens ("-"),
in which case newlines were removed, and conversion of multiple spaces to
a single space.

4.2 Proper ingesting of XML queries

While the character offsets are calculated based on treating the source
document as "raw text", the "name" strings being referenced by the queries
sometimes contain XML metacharacters, and these had to be "re-escaped" for
proper inclusion in the queries.xml file.  For example, an actual name like
"AT&T" may show up a source document file as "AT&amp;T" (because the source
document was originally formatted as XML data).  But since the source doc is
being treated here as raw text, this name string is treated in queries.xml as
having 7 characters (i.e., the character offsets, when provided, will point to
a string of length 7).

However, the "name" element itself, as presented in the queries.xml file, will
be even longer - "AT&amp;amp;T" - because the queries.xml file is intended to
be handled by an XML parser, which will return "AT&amp;T" when this "name"
element is extracted.  Using the queries.xml data without XML parsing would
yield a mismatch between the "name" value and the corresponding string in the
source data.


5. Acknowledgments

This material is based on research sponsored by Air Force Research
Laboratory and Defense Advance Research Projects Agency under
agreement number FA8750-13-2-0045. The U.S. Government is authoized
to reproduce and distribute reprints for Governmental purposes
notwithstanding any copyright notation thereon. The views and
conclusions contained herein are those of the authors and should
not be interpreted as necessarily representing the official policies
or endorsements, either expressed or implied, of Air Force Research
Laboratory and Defense Advanced Research Projects Agency or the U.S.
Government.

The authors acknowledge the following contributors to this data set:
Dave Graff (LDC)
Robert Parker (LDC)
Neil Kuster (LDC)
Heng Ji (RPI)
Ralph Grishman (NYU)
Mihai Surdeanu (UA)
Hoa Dang (NIST)
Boyan Onyshkevych (DARPA)


6. References

Joe Ellis, Jeremy Getman, Justin Mott, Xuansong Li, Kira Griffitt, 
Stephanie M. Strassel, Jonathan Wright. 2013
Linguistic Resources for 2013 Knowledge Base Population Evaluations 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-workshop2013-linguistic-resources-kbp-eval.pdf
TAC KBP 2013 Workshop: National Institute of Standards and Technology,
Gaithersburg, MD, November 18-19

Xuansong Li, Joe Ellis, Kira Griffit, Stephanie Strassel, Robert Parker,
Jonathan Wright. 2011
Linguistic Resources for 2011 Knowledge Base Population Evaluation 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tac2011-linguistic-resources-kbp.pdf
TAC 2011: Proceedings of the Fourth Text Analysis Conference,
Gaithersburg, Maryland, November 14-15


7. Copyright Information

(c) 2016 Trustees of the University of Pennsylvania


8. Contact Information

For further information about this data release, or the TAC KBP
project, contact the following project staff at LDC:

    Jeremy Getman, Lead Annotator        <jgetman@ldc.upenn.edu>
    Stephanie Strassel, PI               <strassel@ldc.upenn.edu>

-----------------------------------------------------------------------------
README created by Neil Kuster on March 31, 2016
       updated by Neil Kuster on April 5, 2016
       updated by Neil Kuster on September 19, 2016