TAC KBP English Regular Slot Filling
         Comprehensive Training and Evaluation Data 2009-2014

         Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel

1. Overview

This package contains training and evaluation data produced in support 
of the TAC KBP Slot Filling evaluation track conducted from 2009 to 
2014. 

Text Analysis Conference (TAC) is a series of workshops organized by the 
National Institute of Standards and Technology (NIST). TAC was developed 
to encourage research in natural language processing (NLP) and related 
applications by providing a large test collection, common evaluation 
procedures, and a forum for researchers to share their results. Through 
its various evaluations, the Knowledge Base Population (KBP) track of 
TAC encourages the development of systems that can match entities 
mentioned in natural texts with those appearing in a knowledge base and 
extract novel information about entities from a document collection and 
add it to a new or existing knowledge base. 

The regular English Slot Filling evaluation track (SF) involves mining 
information about entities from text. SF can be viewed as more 
traditional Information Extraction (IE), or alternatively, as a Question 
Answering (QA) task, in which the questions are static but the targets 
change. In completing the task, participating systems and LDC annotators 
searched a corpus for information on certain attributes (slots) of 
person (PER) and organization (ORG) entities and attempted to return all 
valid answers (slot fillers) in the source collection. For more 
information about English Slot Filling, please refer to the 2014 track 
home page (2014 was the last year in which the regular Slot Filling 
evaluation was conducted), at http://www.nist.gov/tac. 

This package contains all evaluation and training data developed in 
support of TAC KBP Slot Filling during the six years in which the track 
was conducted, from 2009-2014. This includes queries, the 'manual runs' 
(human-produced responses to the queries), and the final rounds of 
assessment results. The corresponding source document collections for 
this release are included in LDC2018T03: TAC KBP Comprehensive English 
Source Corpora 2009-2014. The corresponding Knowledge Base (KB) for much 
of the data - a 2008 snapshot of Wikipedia - can be obtained via 
LDC2014T16: TAC KBP Reference Knowledge Base. 

The data included in this package were originally released by LDC to TAC 
KBP coordinators and performers under the following ecorpora catalog IDs 
and titles: 

LDC2009E56:  TAC KBP 2009 Evaluation Generic Infoboxes V2.0
LDC2009E65:  TAC KBP 2009 Evaluation Slot Filling List
LDC2009E90:  TAC KBP 2009 Assessment Results
LDC2009E110: TAC KBP 2009 Evaluation NIL Link Assessment
LDC2010E18:  TAC 2010 KBP Training Slot Filling Annotation V2.1
LDC2010E24:  TAC 2010 KBP Generic Infoboxes
LDC2010E61:  TAC 2010 KBP Assessment Results V1.2
LDC2010E32:  TAC 2010 KBP Evaluation Slot Filling Annotation
LDC2011E48:  TAC 2011 KBP English Training Regular Slot Filling 
             Annotation
LDC2011E88:  TAC 2011 KBP English Regular Slot Filling Assessment 
             Results V1.2
LDC2011E89:  TAC 2011 KBP English Evaluation Regular Slot Filling 
             Annotation V1.2
LDC2012E91:  TAC 2012 KBP English Regular Slot Filling Evaluation 
             Annotations V1.1
LDC2012E115: TAC 2012 KBP English Regular Slot Filling Assessment 
             Results V1.2
LDC2013E60:  TAC 2013 KBP English Regular Slot Filling per:title 
             Training Data
LDC2013E77:  TAC 2013 KBP English Regular Slot Filling Evaluation 
             Queries and Annotations V1.1
LDC2013E91:  TAC 2013 KBP English Regular Slot Filling Evaluation 
             RAssessment esults V1.1
LDC2014E66:  TAC 2014 KBP English Regular Slot Filling Evaluation 
             Queries and Annotations V1.1
LDC2014E75:  TAC 2014 KBP English Regular Slot Filling Evaluation 
             Assessment Results V2.0
LDC2015E46:  TAC KBP English Regular Slot Filling - Comprehensive 
             Training and Evaluation Data 2009-2014

Summaries of data included in this package (for more details see
  ./data/*/contents.txt):

Query Data:
+------+------------+-----+-----+-----+-------+
| year |    set     | PER | ORG | GPE | total |
+------+------------+-----+-----+-----+-------+
| 2009 | evaluation |  17 |  31 |   5 |    53 |
| 2010 | training   |  42 |  56 | n/a |    98 |
| 2010 | evaluation |  50 |  50 | n/a |   100 |
| 2011 | training   |  92 | 106 | n/a |   198 |
| 2011 | evaluation |  50 |  50 | n/a |   100 |
| 2012 | evaluation |  40 |  40 | n/a |    80 |
| 2013 | evaluation |  50 |  50 | n/a |   100 |
| 2014 | evaluation |  50 |  50 | n/a |   100 |
+------+------------+-----+-----+-----+-------+

Manual Run Data:
+------+------------+------------------+
| year |    set     | manual responses |
+------+------------+------------------+
| 2010 | training   |              336 |
| 2010 | evaluation |              799 |
| 2011 | training   |            1,627 |
| 2011 | evaluation |              796 |
| 2012 | evaluation |            1,553 |
| 2013 | evaluation |            2,383 |
| 2014 | evaluation |            2,216 |
+------+------------+------------------+

Assessment Data:
+------+------------+-----------+
| eval |            | assessed  |
| year |    set     | responses |
+------+------------+-----------+
| 2009 | evaluation |    10,416 |
| 2010 | evaluation |    24,515 |
| 2011 | evaluation |    28,041 |
| 2012 | evaluation |    22,885 |
| 2013 | evaluation |    27,655 |
| 2013 | training   |     4,660 | 
| 2014 | evaluation |    21,956 |
+------+------------+-----------+


2. Contents

./README.txt

  This file.

./data/20*/contents.txt

  The data in this package are organized by the year of original release
  in order to clarify dependencies, highlight occassional differences in
  formats from one year to another, and to increase readability in
  documentation. The contents.txt file within each year's root directory
  provides a list of the contents for all subdirectories as well as
  specific details about file formats and contents. 

./dtd/sf_queries_2009-2010-2011.dtd

  The dtd against which to validate these files:
  
  ./data/2009/eval/tac_kbp_2009_regular_sf_evaluation_queries.xml
  ./data/2010/eval/tac_kbp_2010_regular_sf_evaluation_queries.xml
  ./data/2010/training/tac_kbp_2009_regular_sf_evaluation_queries.xml
  ./data/2010/training/tac_kbp_2010_regular_sf_training_queries.xml
  ./data/2011/eval/tac_kbp_2011_regular_sf_evaluation_queries.xml

./dtd/sf_queries_2012-2013.dtd

  The dtd against which to validate these files:
  
  ./data/2012/eval/tac_kbp_2012_regular_sf_evaluation_queries.xml
  ./data/2013/eval/tac_kbp_2013_regular_sf_evaluation_queries.xml

./dtd/sf_queries_2014.dtd

  The dtd against which to validate this file:
  
  ./data/2014/eval/tac_kbp_2014_regular_sf_evaluation_queries.xml

./docs/all_files.md5

  Paths (relative to the root of the corpus) and md5 checksums for all files
  included in the package.

./docs/guidelines/*/*.pdf

  The guidelines used by annotators in developing slot filling queries,  
  manual run annotation, and assessment data contained in this corpus.

./docs/task_descriptions/KBP2009-TaskDefinition-0218.pdf
./docs/task_descriptions/KBP2010_TaskDefinition_Aug31.pdf
./docs/task_descriptions/KBP2011_TaskDefinition.pdf
./docs/task_descriptions/KBP2012_TaskDefinition_1.1.pdf

  Task Descriptions for respective years covering all of the TAC KBP tracks, 
  written by evaluation track coordinators. Note that these documents also 
  describe tasks not relevant to this specific package.

./docs/task_descriptions/KBP2013_TaskDefinition_EnglishSlotFilling_1.1.pdf
./docs/task_descriptions/KBP2014_TaskDefinition_EnglishSlotFilling_1.1.pdf
  
  Task descriptions for the 2013 and 2014 English Regular Slot Filling 
  evaluation tracks, written by track coordinators.

./tools/scorers/KBP20*_English_SF_slot-list.txt

  Slot list files to be used with the 2013 and 2014 scorers respectively.

./tools/scorers/SFScore20*.java

  Scorers for regular slot filling files for 2009-2014 respectively, as 
  provided to LDC by evaluation track coordinators, with no further testing.
    
./tools/validators/check_kbp_20*_slot-filling.pl

  Validators for regular slot filling files for 2009-2014 respectively, as 
  provided to LDC by evaluation track coordinators, with no further testing.


3. Annotation Tasks

The tasks conducted by LDC annotators in support of regular SF included 
entity selection/query development, manual run development, slot 
mapping, and assessment of system- and human-produced responses to 
queries. Each of these subtasks is explained below. 

3.1 Query Development

Entities, which are the basis of SF queries, were selected based 
primarily on their level of non-confusability and productivity. A 
candidate query entity was considered non-confusable if there were one 
or more references to it in the source corpus that were "canonical", 
meaning that they were not an alias and, for persons, included more than 
just a first or last name. Productivity for candidate queries was 
determined by searching the source corpus to find whether it contained 
at least two slot fillers (i.e. answers) for the entity. 

Entities with well-populated Knowledge Base (KB) entries (either in the 
official TAC KBP KB or in online resources such as Wikipedia) were also 
generally avoided as query entities. Such entities were dispreferred 
both to reduce the advantage gained by using online resources and 
because there was a restriction against returning fillers that were 
redundant with information already in the official KB. Linking query 
entities to the KB was discontinued from SF in 2014, which removed the 
redundancy restriction on responses (though duplicate responses were 
still considered incorrect). However, query developers in 2014 were 
still required to check live Wikipedia when considering potential query 
entities so as to continue avoiding any for which the online resource 
would indicate numerous correct responses.

The final set of SF queries for each evaluation was also selected with 
the goal of an approximately balanced representation of entity types 
(person, organization, and - in 2009 only - geo-political entity) and of 
response type for slots (i.e., those that take named entities as 
fillers, those that take values (dates and numbers) as fillers, and 
those that take strings as fillers). 

Following initial query development, a quality control pass was 
conducted to flag any fillers that did not have adequate justification 
in the source document, or that might be at variance with the guidelines 
in any way. These flagged fillers were then adjudicated by senior 
annotators who updated, removed, or replaced them as appropriate. 

3.2 Manual Run Development
 
LDC developed "manual runs", or the human-produced set of annotated 
responses for each of the evaluation queries, for all but the 2009 SF 
evaluation cycles. For each query, annotators were given up to two hours 
to search the corpus and locate all valid fillers. Note that, unlike 
systems, annotators producing the manual runs were instructed to return 
duplicate fillers from separate source documents if time permitted in 
order to provide more training data for systems in the future. 

Justification - the minimum extents of provenance supporting the 
validity of a slot filler - was first added to responses in 2012 in 
order to pinpoint the sources of assertions and, thereby, reduce the 
effort required for assessment. Valid justification strings were said to 
clearly identify all three elements of a relation (i.e. the subject 
entity, the predicate slot, and the object filler) with minimal 
extraneous text. In 2013, justification was modified to allow for up to 
two discontiguous strings selected from as many separate documents, up 
from one string in 2012. In 2014, justification was again altered to 
allow for up to four justification strings. This facilitated a greater 
potential for inferred relations that would be difficult to justify with 
just a single document. 

Following the initial round of annotation for manual runs, a quality 
control pass was conducted to flag any fillers that did not have 
adequate justification in the source document, or that might be at 
variance with the guidelines in any way. These flagged fillers were then 
adjudicated by senior annotators who updated or removed them as 
appropriate. 

3.3 Slot Mapping

For the 2009-2013 evaluations, a senior annotator performed a 
slot-mapping process before assessment in order to indicate how existing 
attribute labels in the KB for non-NIL query entities would map to the 
set of TAC KBP SF slots. This process was necessary because attribute 
labels for the same type of information varied widely in Wikipedia (the 
source of the TAC KBP KB) based on entity type information. For example, 
an actor's birth date might be labeled as 'actor-birth-date' while a 
golfer's could be indicated by 'date-of-birth-golfer'. During the 
slot-mapping process, both of these would be linked to the TAC KBP slot 
'per:date_of_birth'. These mappings were then imported into the 
assessment tool so that they could be coreferenced with responses marked 
as correct (with respect to the slot definition), thereby indicating 
that those responses were redundant with the KB.

3.4 Assessment

In assessment, annotators first judged the validity of anonymized human- 
and system-produced responses returned for the query set and then 
coreferenced those marked as correct. Fillers were assessed as correct 
if they were found to be both compatible with the slot descriptions and 
supported in the text. Fillers were assessed as wrong if they did not 
meet both of the conditions for correctness, or as inexact if overly 
insufficient or extraneous text had been selected for an otherwise 
correct answer.

For the years in which it was produced, justification was assessed as 
correct if it succinctly and completely supported the relation, wrong if 
it did not support the relation at all (or if the corresponding filler 
was marked wrong), inexact-short if part but not all of the information 
necessary to support the relation was provided, or inexact-long if it 
contained all information necessary to support the relation but also a 
great deal of extraneous text. In 2014, responses with justification 
comprising more than 600 characters in total were automatically ignored 
and removed from the pool of responses for assessment. 

After first passes of assessment were completed, quality control was 
performed on the data by senior annotators. During quality control, the 
text extents of annotated fillers and justifications were checked for 
correctness, equivalence classes for entities assessed as correct were 
checked for accuracy, and potentially problematic assessments were 
either corrected or flagged for additional review.


4. Using the Data

As mentioned in the intro, note that the corresponding source document 
collections for this release are included in LDC2018T03: TAC KBP 
Comprehensive English Source Corpora 2009-2014. Also, the corresponding 
Knowledge Base (KB) for much of the data - a 2008 snapshot of Wikipedia 
- can be obtained via LDC2014T16: TAC KBP Reference Knowledge Base. 

4.1 Text Normalization and Offset Calculation 

Text normalization of queries consisting of a 1-for-1 substitution of 
newline (0x0A) and tab (0x09) characters with space (0x20) characters 
was performed on the document text input to the response field. 

The values of the beg and end XML elements in the later queries.xml 
files indicate character offsets to identify text extents in the source. 
Offset counting starts from the initial opening angle bracket of the 
<DOC> element (<doc> in DF sources), which is usually the initial 
character (character 0) of the source. Note as well that character 
counting includes newlines and all markup characters - that is, the 
offsets are based on treating the source document file as "raw text", 
with all its markup included. 

Note that although strings included in the annotation files (queries and 
gold standard mentions) generally match source documents, a few 
characters are normalized in order to enhance readability: Conversion of 
newlines to spaces, except where preceding characters were hyphens 
("-"), in which case newlines were removed, and conversion of multiple 
spaces to a single space. 

4.2 Proper Ingesting of XML Queries 

While the character offsets are calculated based on treating the source 
document as "raw text", the "name" strings being referenced by the 
queries sometimes contain XML metacharacters, and these had to be 
"re-escaped" for proper inclusion in the queries.xml file. For example, 
an actual name like "AT&T" may show up a source document file as 
"AT&amp;T" (because the source document was originally formatted as XML 
data). But since the source doc is being treated here as raw text, this 
name string is treated in queries.xml as having 7 characters (i.e., the 
character offsets, when provided, will point to a string of length 7). 

However, the "name" element itself, as presented in the queries.xml 
file, will be even longer - "AT&amp;amp;T" - because the queries.xml 
file is intended to be handled by an XML parser, which will return 
"AT&amp;T" when this "name" element is extracted. Using the queries.xml 
data without XML parsing would yield a mismatch between the "name" value 
and the corresponding string in the source data. 

5. Acknowledgments 

This material is based on research sponsored by Air Force Research 
Laboratory and Defense Advance Research Projects Agency under agreement 
number FA8750-13-2-0045. The U.S. Government is authoized to reproduce 
and distribute reprints for Governmental purposes notwithstanding any 
copyright notation thereon. The views and conclusions contained herein 
are those of the authors and should not be interpreted as necessarily 
representing the official policies or endorsements, either expressed or 
implied, of Air Force Research Laboratory and Defense Advanced Research 
Projects Agency or the U.S. Government. 

The authors acknowledge the following contributors to this data set:
Dave Graff (LDC)
Heather Simpson (LDC)
Robert Parker (LDC)
Neil Kuster (LDC)
Hoa Dang (NIST)
Heng Ji (RPI)
Ralph Grishman (NYU)
James Mayfield (JHU)
Mihai Surdeanu (UA)
Paul McNamee (JHU)
Boyan Onyshkevych (DARPA)


6. References

Joe Ellis, Jeremy Getman, Stephanie M. Strassel. 2014
Overview of Linguistic Resources for the TAC KBP 2014 Evaluations: 
Planning, Execution, and Results 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-2014-overview.pdf
TAC KBP 2014 Workshop: National Institute of Standards and Technology, 
Gaithersburg, Maryland, November 17-18

Joe Ellis, Jeremy Getman, Justin Mott, Xuansong Li, Kira Griffitt, 
Stephanie M. Strassel, Jonathan Wright. 2013
Linguistic Resources for 2013 Knowledge Base Population Evaluations 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-workshop2013-linguistic-resources-kbp-eval.pdf
TAC KBP 2013 Workshop: National Institute of Standards and Technology,
Gaithersburg, MD, November 18-19

Joe Ellis, Xuansong Li, Kira Griffitt, Stephanie M. Strassel,
Jonathan Wright. 2012 
Linguistic Resources for 2012 Knowledge Base Population Evaluations 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-workshop2012-linguistic-resources-kbp-eval.pdf
TAC KBP 2012 Workshop: National Institute of Standards and Technology,
Gaithersburg, MD, November 5-6

Xuansong Li, Joe Ellis, Kira Griffit, Stephanie Strassel, Robert Parker,
Jonathan Wright. 2011
Linguistic Resources for 2011 Knowledge Base Population Evaluation 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tac2011-linguistic-resources-kbp.pdf
TAC 2011: Proceedings of the Fourth Text Analysis Conference,
Gaithersburg, Maryland, November 14-15

Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, Joe Ellis. 2010
Overview of the TAC 2010 Knowledge Base Population Track
TAC 2010 Workshop: Proceedings of the Third Text Analysis Conference,
Gaithersburg, MD, November 15-16

P. McNamee, H.T. Dang. 2009
Overview of the TAC 2009 Knowledge Base Population Track
TAC 2009: Proceedings of the Second Text Analysis Conference
Gaithersburg, MD, November 16-17


7. Copyright Information

(c) 2018 Trustees of the University of Pennsylvania


8. Contact Information

For further information about this data release, or the TAC KBP
project, contact the following project staff at LDC:

    Joe Ellis, Project Manager           <joellis@ldc.upenn.edu>
    Jeremy Getman, Lead Annotator        <jgetman@ldc.upenn.edu>
    Stephanie Strassel, PI               <strassel@ldc.upenn.edu>

------------------------------------------------------------------------
README created by Neil Kuster on January 25, 2016
       updated by Neil Kuster on March 28, 2016
       updated by Joe Ellis on April 21, 2016
       updated by Neil Kuster on September 14, 2016
       updated by Joe Ellis on September 19, 2016
       updated by Joe Ellis on January 3, 2017
       updated by Joe Ellis on February 15, 2017
       updated by Jeremy Getman on September 27, 2018