TAC KBP English Surprise Slot Filling
              Comprehensive Training and Evaluation Data 2010

            Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel


1. Overview

This package contains evaluation data produced in support of the TAC KBP 
Surprise Slot Filling track in 2010. 

Text Analysis Conference (TAC) is a series of workshops organized by the 
National Institute of Standards and Technology (NIST). TAC was developed 
to encourage research in natural language processing (NLP) and related 
applications by providing a large test collection, common evaluation 
procedures, and a forum for researchers to share their results. Through 
its various evaluations, the Knowledge Base Population (KBP) track of 
TAC encourages the development of systems that can match entities 
mentioned in natural texts with those appearing in a knowledge base and 
extract novel information about entities from a document collection and 
add it to a new or existing knowledge base. 

The Surprise Slot Filling track was developed to address the need for an 
information extraction system that can easily and rapidly be adapted to 
new types of relations and events. The track was a variation of the 
regular Slot Filling evaluation track (SF), which involves mining 
information about entities from text using a specified set of 'slots' 
(attributes). Surprise SF participants were given four new slot types 
("diseases", "awards-won" and "charity-supported" for persons; 
"products" for organizations), annotation guidelines, training data, and 
a maximum of 4 days to develop their systems and run them on the source 
collection. More information about the TAC KBP Surprise Slot Filling 
track and other TAC KBP evaluations can be found on the NIST TAC 
website, http://www.nist.gov/tac/. 

This package contains all evaluation and training data developed in 
support of TAC KBP Surprise SF in 2010, the only year in which the track 
was run. This includes queries, the 'manual run' (human-produced 
responses to the queries), and the final round of assessment results. 
The corresponding source document collection for this release is 
included in LDC2018T03: TAC KBP Comprehensive English Source Corpora 
2009-2014. The corresponding Knowledge Base (KB) for the data - a 2008 
snapshot of Wikipedia - can be obtained via LDC2014T16: TAC KBP 
Reference Knowledge Base. 

The data included in this package were originally released by LDC to TAC 
KBP coordinators and performers under the following ecorpora catalog IDs 
and titles: 

LDC2010E52: TAC 2010 KBP Training Surprise Slot Filling Annotation
LDC2010E61: TAC 2010 KBP Assessment Results V1.2
LDC2012E33: TAC 2010 KBP Evaluation Surprise Slot Filling Annotation
LDC2015E49: TAC KBP English Surprise Slot Filling – Comprehensive 
            Training and Evaluation Data 2010

Summary of data included in this package:

Queries:
+------+------------+-----+-----+-------+
| year |    set     | PER | ORG | total |
+------+------------+-----+-----+-------+
| 2010 | evaluation |  30 |  10 |   40  |
| 2010 | training   |  24 |   8 |   32  |
+------+------------+-----+-----+-------+

Manual Responses:
+------+------------+------------------+
| year |    set     | manual responses |
+------+------------+------------------+
| 2010 | evaluation |              252 |
| 2010 | training   |               83 |
+------+------------+------------------+

Assessment Data:
+------+------------+------------+
|      |            |  assessed  |
| year |    set     |  responses |
+------+------------+------------+
| 2010 | evaluation |        996 |
+------+------------+------------+

2. Contents

./docs/README.txt

  This file.

./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml

  This file contains 40 queries, corresponding to 30 unique PER and 10 
  unique ORG entities. Note that each query has an id attribute, 
  formatted as the letters "SF" plus a unique integer value. Each query 
  consists of the following 4 elements:
  
    <name>      - A namestring for the query entity 
    
    <docid>     - The ID for the document in the source corpus 
                  (LDC2018T03: TAC KBP Comprehensive English Source 
                  Corpora 2009-2014) from which <name> was extracted. 
                
    <enttype>   - The query entity's type (PER or ORG)
    
    <nodeid>    - The Knowledge Base (KB) node ID of the query entity. 
                  If the entity node ID begins with "E", the query 
                  refers to an entity in the KB (LDC2014T16: TAC KBP 
                  Reference Knowledge Base). If the entity node ID 
                  begins with "NIL", the query entity does not have an 
                  entry in the KB.

./data/eval/tac_kbp_2010_surprise_sf_evaluation_manual_run.tab

  This file contains the human-produced responses to each query in 
  ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml
  The manual run file has a header row with the column labels and is 
  tab-delimited, with 11 fields total. The column descriptions are as 
  follows:

     1. filler_id   -  a unique integer ID for the response 

     2. sf_id   -  the slot filling query ID for the entity

     3. system_id   -  the ID of the system that generated the response;
                       always "LDC" in these data

     4. slot_name   -  the name of the slot for the filler 

     5. docid   -  the unique ID of the document in the corpus 
                   (LDC2018T03: TAC KBP Comprehensive English Source 
                   Corpora 2009-2014)from which the filler is extracted
    
     6. start_char   -  the (zero-based) character offset in the 
                        document for the beginning of the substring from 
                        which the filler was extracted

     7. end_char     -  the (zero-based) character offset in the 
                        document for the end of the substring from 
                        which the filler was extracted

     8. response     -  The raw substring in the document from which the 
                        slot filler in column 9 was extracted

     9. norm_response   -  The possibly-normalized version of the 
                           slot filler and the official response; see 
                           guidelines for notes on normalization
    
     10. equiv_class_id  -  a unique integer ID for the equivalence 
                            class into which the filler falls; always 
                            identical to the filler_id in these data 
                            (coreference was not performed as part of 
                            the manual run). 

     11. judgment  -  a confidence score; always 1 in these data

  The manual run file contains a total of 252 responses, 160 from 
  newswire documents and 92 from web documents. Below is a table 
  summarizing the numbers of fillers in the manual run, by slot type: 

       Slot Name                   Count
     -------------------------------------
     | org:products              |   151 |     
     | per:awards_won            |    46 |
     | per:charities_supported   |    36 |
     | per:diseases              |    19 |
     -------------------------------------

./data/eval/tac_kbp_2010_surprise_sf_evaluation_submission.txt

  This file contains the same set of human-produced responses in
  ./data/eval/tac_kbp_2010_surprise_sf_evaluation_manual_run.tab
  However, the data are reformatted here to meet the requirements for 
  system evaluation submissions and, thereby, include only a subset of 
  the attributes included in the manual run as well as additional rows 
  to explicitly indicate query+slot combinations for which no responses 
  were found.
  
  The submission file is space-delimited, with 4-5 fields total. 
  Space characters after the first 4 fields are assumed to be part of 
  the contents of the 5th field. The field definitions are as follows:
  
    1. Query ID   -  The slot filling query ID for the entity
     
    2. Slot Name  -  The name of the slot for the filler 

    3. Run ID  -  A unique run ID for the submission; always "LDC1" in 
                  these data

    4. Doc ID  -  The unique ID of the document in the corpus 
                  (LDC2018T03: TAC KBP Comprehensive English Source 
                  Corpora 2009-2014)from which the filler is extracted
                  or "NIL" if annotators found no fillers for the 
                  query+slot combination
                
    5. Slot Filler - The possibly-normalized version of the response 
                     (column 9 in the manual run) or empty if column 4 
                     contains the value "NIL"
  
./data/eval/assessment/*

  The assessment directory holds 100 assessment files, the combination 
  of which contains a total of 996 assessed responses. Assessment was 
  performed on a set of pooled responses provided by NIST that includes 
  fillers returned by both systems and LDC annotators. Note that 36 of 
  the files in this directory are empty, indicating that no responses 
  were returned for that particular query+slot combination. 

  There is one file for each combination of query entity and slot for
  all of the queries found in
  ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml

  The assessment results files are tab-delimited, with 6 fields total. 
  The field definitions are as follows:

   1. Response ID - A unique integer ID for the response

   2. QueryID:slot - A concatenation of the Slot Filling query ID and 
                     the relevant slot name, separated by a colon
                     
   3. DocID  -  The unique ID of the document in the source corpus 
                (LDC2018T03 TAC KBP Comprehensive English Source Corpora 
                2009-2014) that was identified as supporting the 
                relation between the query entity and the slot filler
                
   4. Slot filler assessment - Judgment of the response provided in 
                               column 6 with respect to the 
                               justification provided in column 10. 
                               Values will be one of:
                   
               -1 - wrong 
                1 - correct 
                2 - redundant 
                3 - inexact 

   5. Equivalence class ID - A unique ID for the equivalence class 
                             cluster to which the filler belongs as 
                             defined by LDC assessors. zero for 
                             incorrect and inexact fillers, non-zero for 
                             correct and redundant fillers. Note that 
                             equivalence class IDs are unique only to a 
                             specific query-slot combination. Matching 
                             IDs within a query-slot pair are considered 
                             to be an equivalent grouping of answers.    

   6. Slot filler - The (possibly normalized) response.

./data/training/tac_kbp_2010_surprise_sf_training_queries.xml

  This file contains 32 training queries: 24 PER queries and 8 ORG 
  queries. The format of these queries is identical to those in
  ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml

./data/training/tac_kbp_2010_surprise_sf_training_manual_run.tab

  This file contains the human-produced responses to each query in 
  ./data/training/tac_kbp_2010_surprise_sf_training_queries.xml
  The format of this file is identical to:
  ./data/eval/tac_kbp_2010_surprise_sf_evaluation_manual_run.tab

./dtd/surprisesf_queries.dtd

  The DTD for:

  ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml
  ./data/training/tac_kbp_2010_surprise_sf_training_queries.xml

./docs/all_files.md5

  Paths (relative to the root of the corpus) and md5 checksums for all 
  files included in the package.

./docs/guidelines/TAC_2010_KBP_Assessment_Guidelines_V1.8.pdf
./docs/guidelines/TAC_KBP_2010_Slot_Filling_Annotation_Guidelines_Surprise_Task_V2.5.pdf

  The guidelines used by annotators in developing the 2010 Surprise Slot
  Filling queries, manual runs, and assessments contained in this corpus.

./docs/task_description/KBP2010_TaskDefinition.pdf

  Task Description for the TAC 2010 Knowledge Base Population evaluation
  track written by evaluation track coordinators.

./tools/check_kbp_surprise-slot-filling.pl

  Scorer for 2010 Surprise Slot Filling, as provided to LDC by 
  evaluation track coordinators, with no further testing.

./tools/SFScore.java

  Validator for 2010 Surprise Slot Filling, as provided to LDC by 
  evaluation track coordinators, with no further testing.


3. Annotation tasks

The tasks conducted by LDC annotators in support of Surprise SF included 
query development, manual run development, slot mapping, and assessment 
of system- and human-produced responses to queries. Each of these 
subtasks is explained below. 

3.1 Query Development

Entities, which are the basis of queries for all SF tracks, were 
selected based primarily on their level of non-confusability and 
productivity. A candidate query entity was considered non-confusable if 
there were one or more references to it in the source corpus that were 
"canonical", meaning that they were not an alias and, for persons, 
included more than just a first or last name. Productivity for candidate 
queries was determined by searching the source corpus to find whether it 
contained at least two slot fillers (i.e. answers) for the entity.   

Entities with well-populated Knowledge Base (KB) entries (either in the 
official TAC KBP KB or in online resources such as Wikipedia) were also 
generally avoided as query entities. Such entities were dispreferred 
both to reduce the advantage gained by using online resources and 
because there was a restriction against returning fillers that were 
redundant with information already in the official KB.

Following initial query development, a quality control pass was 
conducted to flag any fillers that did not have adequate justification 
in the source document, or that might be at variance with the guidelines 
in any way. These flagged fillers were then adjudicated by senior 
annotators who updated, removed, or replaced them as appropriate.

3.2 Manual Run Development

LDC developed "manual runs", or the human-produced set of annotated 
responses for each of the Surprise SF training and evaluation queries. 
For each query, annotators were given up to two hours to search the 
corpus and locate all valid fillers. 

Following the initial round of annotation for manual runs, a quality 
control pass was conducted to flag any fillers that might be at variance 
with the guidelines in any way. These flagged fillers were then 
adjudicated by senior annotators who updated or removed them as 
appropriate. 

3.2 Slot Mapping 

A senior annotator performed a slot-mapping process before assessment in 
order to indicate how existing attribute labels in the KB for non-NIL 
query entities would map to the set of TAC KBP SF slots. This process 
was necessary because attribute labels for the same type of information 
varied widely in Wikipedia (the source of the TAC KBP KB) based on 
entity type information. For example, the awards won by an actor might 
be labeled as 'actor-awards-won' while a golfer's could be indicated by 
'awards-golfer'. During the slot-mapping process, both of these would be 
linked to the Surprise SF slot 'per:awards_won'. These mappings were 
then imported into the assessment tool so that they could be 
coreferenced with responses marked as correct (with respect to the slot 
definition), thereby indicating that those responses were redundant with 
the KB.

3.3 Assessment

In assessment, annotators first judged the validity of anonymized human- 
and system-produced responses returned for the query set and then 
coreferenced those marked as correct. Fillers were assessed as correct 
if they were found to be both compatible with the slot descriptions and 
supported in the text. Fillers were assessed as wrong if they did not 
meet both of the conditions for correctness, or as inexact if overly 
insufficient or extraneous text had been selected for an otherwise 
correct answer. 

After first passes of assessment were completed, quality control was 
performed on the data by senior annotators. During quality control, the 
text extents of annotated fillers were checked for correctness, 
equivalence classes for entities assessed as correct were checked for 
accuracy, and potentially problematic assessments were either corrected 
or flagged for additional review. 


4. Using the Data

As mentioned in the intro, note that the corresponding source document 
collection for this release is included in LDC2018T03: TAC KBP 
Comprehensive English Source Corpora 2009-2014. Also, the corresponding 
Knowledge Base (KB) for the data - a 2008 snapshot of Wikipedia - can be 
obtained via LDC2014T16: TAC KBP Reference Knowledge Base. 

4.1 Text normalization and offset calculation

Text normalization of queries consisting of a 1-for-1 substitution of 
newline (0x0A) and tab (0x09) characters with space (0x20) characters 
was performed on the document text input to the response field. 

The values of the beg and end XML elements in the queries.xml files 
indicate character offsets to identify text extents in the source. 
Offset counting starts from the initial character (character 0) of the 
source document and includes newlines and all markup characters - that 
is, the offsets are based on treating the source document file as "raw 
text", with all its markup included. 

4.2 Proper ingesting of XML queries

While the character offsets are calculated based on treating the source 
document as "raw text", the "name" strings being referenced by the 
queries sometimes contain XML metacharacters, and these had to be 
"re-escaped" for proper inclusion in the queries.xml file. For example, 
an actual name like "AT&T" may show up a source document file as 
"AT&amp;T" (because the source document was originally formatted as XML 
data). But since the source doc is being treated here as raw text, this 
name string is treated in queries.xml as having 7 characters (i.e., the 
character offsets, when provided, will point to a string of length 7). 

However, the "name" element itself, as presented in the queries.xml 
file, will be even longer - "AT&amp;amp;T" - because the queries.xml 
file is intended to be handled by an XML parser, which will return 
"AT&amp;T" when this "name" element is extracted. Using the queries.xml 
data without XML parsing would yield a mismatch between the "name" value 
and the corresponding string in the source data. 

5. Acknowledgements 

This material is based on research sponsored by Air Force Research 
Laboratory and Defense Advance Research Projects Agency under agreement 
number FA8750-13-2-0045. The U.S. Government is authorized to reproduce 
and distribute reprints for Governmental purposes notwithstanding any 
copyright notation thereon. The views and conclusions contained herein 
are those of the authors and should not be interpreted as necessarily 
representing the official policies or endorsements, either expressed or 
implied, of Air Force Research Laboratory and Defense Advanced Research 
Projects Agency or the U.S. Government. 

The authors acknowledge the following contributors to this data set:
Dana Fore (LDC)
Dave Graff (LDC)
Heather Simpson (LDC)
Robert Parker (LDC)
Heng Ji (RPI)
Ralph Grishman (NYU)
Hoa Dang (NIST)
Boyan Onyshkevych (DARPA)


6. References

Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, Joe Ellis. 2010
Overview of the TAC 2010 Knowledge Base Population Track
TAC 2010 Workshop: Proceedings of the Third Text Analysis Conference,
Gaithersburg, MD, November 15-16


7. Copyright Information

(c) 2021 Trustees of the University of Pennsylvania


8. Contact Information

For further information about this data release, or the TAC KBP
project, contact the following project staff at LDC:

    Jeremy Getman, Lead Annotator        <jgetman@ldc.upenn.edu>
    Stephanie Strassel, PI              <strassel@ldc.upenn.edu>

------------------------------------------------------------------------
README created by Dana Fore on April 1, 2016
       updated by Dana Fore on April 7, 2016
       updated by Neil Kuster on May 17, 2016
       updated by Neil Kuster on September 19, 2016
       updated by Joe Ellis on January 3, 2017
       updated by Joe Ellis on February 17, 2017