TAC KBP English Surprise Slot Filling Comprehensive Training and Evaluation Data 2010 Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel 1. Overview This package contains evaluation data produced in support of the TAC KBP Surprise Slot Filling track in 2010. Text Analysis Conference (TAC) is a series of workshops organized by the National Institute of Standards and Technology (NIST). TAC was developed to encourage research in natural language processing (NLP) and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. Through its various evaluations, the Knowledge Base Population (KBP) track of TAC encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base. The Surprise Slot Filling track was developed to address the need for an information extraction system that can easily and rapidly be adapted to new types of relations and events. The track was a variation of the regular Slot Filling evaluation track (SF), which involves mining information about entities from text using a specified set of 'slots' (attributes). Surprise SF participants were given four new slot types ("diseases", "awards-won" and "charity-supported" for persons; "products" for organizations), annotation guidelines, training data, and a maximum of 4 days to develop their systems and run them on the source collection. More information about the TAC KBP Surprise Slot Filling track and other TAC KBP evaluations can be found on the NIST TAC website, http://www.nist.gov/tac/. This package contains all evaluation and training data developed in support of TAC KBP Surprise SF in 2010, the only year in which the track was run. This includes queries, the 'manual run' (human-produced responses to the queries), and the final round of assessment results. The corresponding source document collection for this release is included in LDC2018T03: TAC KBP Comprehensive English Source Corpora 2009-2014. The corresponding Knowledge Base (KB) for the data - a 2008 snapshot of Wikipedia - can be obtained via LDC2014T16: TAC KBP Reference Knowledge Base. The data included in this package were originally released by LDC to TAC KBP coordinators and performers under the following ecorpora catalog IDs and titles: LDC2010E52: TAC 2010 KBP Training Surprise Slot Filling Annotation LDC2010E61: TAC 2010 KBP Assessment Results V1.2 LDC2012E33: TAC 2010 KBP Evaluation Surprise Slot Filling Annotation LDC2015E49: TAC KBP English Surprise Slot Filling – Comprehensive Training and Evaluation Data 2010 Summary of data included in this package: Queries: +------+------------+-----+-----+-------+ | year | set | PER | ORG | total | +------+------------+-----+-----+-------+ | 2010 | evaluation | 30 | 10 | 40 | | 2010 | training | 24 | 8 | 32 | +------+------------+-----+-----+-------+ Manual Responses: +------+------------+------------------+ | year | set | manual responses | +------+------------+------------------+ | 2010 | evaluation | 252 | | 2010 | training | 83 | +------+------------+------------------+ Assessment Data: +------+------------+------------+ | | | assessed | | year | set | responses | +------+------------+------------+ | 2010 | evaluation | 996 | +------+------------+------------+ 2. Contents ./docs/README.txt This file. ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml This file contains 40 queries, corresponding to 30 unique PER and 10 unique ORG entities. Note that each query has an id attribute, formatted as the letters "SF" plus a unique integer value. Each query consists of the following 4 elements: - A namestring for the query entity - The ID for the document in the source corpus (LDC2018T03: TAC KBP Comprehensive English Source Corpora 2009-2014) from which was extracted. - The query entity's type (PER or ORG) - The Knowledge Base (KB) node ID of the query entity. If the entity node ID begins with "E", the query refers to an entity in the KB (LDC2014T16: TAC KBP Reference Knowledge Base). If the entity node ID begins with "NIL", the query entity does not have an entry in the KB. ./data/eval/tac_kbp_2010_surprise_sf_evaluation_manual_run.tab This file contains the human-produced responses to each query in ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml The manual run file has a header row with the column labels and is tab-delimited, with 11 fields total. The column descriptions are as follows: 1. filler_id - a unique integer ID for the response 2. sf_id - the slot filling query ID for the entity 3. system_id - the ID of the system that generated the response; always "LDC" in these data 4. slot_name - the name of the slot for the filler 5. docid - the unique ID of the document in the corpus (LDC2018T03: TAC KBP Comprehensive English Source Corpora 2009-2014)from which the filler is extracted 6. start_char - the (zero-based) character offset in the document for the beginning of the substring from which the filler was extracted 7. end_char - the (zero-based) character offset in the document for the end of the substring from which the filler was extracted 8. response - The raw substring in the document from which the slot filler in column 9 was extracted 9. norm_response - The possibly-normalized version of the slot filler and the official response; see guidelines for notes on normalization 10. equiv_class_id - a unique integer ID for the equivalence class into which the filler falls; always identical to the filler_id in these data (coreference was not performed as part of the manual run). 11. judgment - a confidence score; always 1 in these data The manual run file contains a total of 252 responses, 160 from newswire documents and 92 from web documents. Below is a table summarizing the numbers of fillers in the manual run, by slot type: Slot Name Count ------------------------------------- | org:products | 151 | | per:awards_won | 46 | | per:charities_supported | 36 | | per:diseases | 19 | ------------------------------------- ./data/eval/tac_kbp_2010_surprise_sf_evaluation_submission.txt This file contains the same set of human-produced responses in ./data/eval/tac_kbp_2010_surprise_sf_evaluation_manual_run.tab However, the data are reformatted here to meet the requirements for system evaluation submissions and, thereby, include only a subset of the attributes included in the manual run as well as additional rows to explicitly indicate query+slot combinations for which no responses were found. The submission file is space-delimited, with 4-5 fields total. Space characters after the first 4 fields are assumed to be part of the contents of the 5th field. The field definitions are as follows: 1. Query ID - The slot filling query ID for the entity 2. Slot Name - The name of the slot for the filler 3. Run ID - A unique run ID for the submission; always "LDC1" in these data 4. Doc ID - The unique ID of the document in the corpus (LDC2018T03: TAC KBP Comprehensive English Source Corpora 2009-2014)from which the filler is extracted or "NIL" if annotators found no fillers for the query+slot combination 5. Slot Filler - The possibly-normalized version of the response (column 9 in the manual run) or empty if column 4 contains the value "NIL" ./data/eval/assessment/* The assessment directory holds 100 assessment files, the combination of which contains a total of 996 assessed responses. Assessment was performed on a set of pooled responses provided by NIST that includes fillers returned by both systems and LDC annotators. Note that 36 of the files in this directory are empty, indicating that no responses were returned for that particular query+slot combination. There is one file for each combination of query entity and slot for all of the queries found in ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml The assessment results files are tab-delimited, with 6 fields total. The field definitions are as follows: 1. Response ID - A unique integer ID for the response 2. QueryID:slot - A concatenation of the Slot Filling query ID and the relevant slot name, separated by a colon 3. DocID - The unique ID of the document in the source corpus (LDC2018T03 TAC KBP Comprehensive English Source Corpora 2009-2014) that was identified as supporting the relation between the query entity and the slot filler 4. Slot filler assessment - Judgment of the response provided in column 6 with respect to the justification provided in column 10. Values will be one of: -1 - wrong 1 - correct 2 - redundant 3 - inexact 5. Equivalence class ID - A unique ID for the equivalence class cluster to which the filler belongs as defined by LDC assessors. zero for incorrect and inexact fillers, non-zero for correct and redundant fillers. Note that equivalence class IDs are unique only to a specific query-slot combination. Matching IDs within a query-slot pair are considered to be an equivalent grouping of answers. 6. Slot filler - The (possibly normalized) response. ./data/training/tac_kbp_2010_surprise_sf_training_queries.xml This file contains 32 training queries: 24 PER queries and 8 ORG queries. The format of these queries is identical to those in ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml ./data/training/tac_kbp_2010_surprise_sf_training_manual_run.tab This file contains the human-produced responses to each query in ./data/training/tac_kbp_2010_surprise_sf_training_queries.xml The format of this file is identical to: ./data/eval/tac_kbp_2010_surprise_sf_evaluation_manual_run.tab ./dtd/surprisesf_queries.dtd The DTD for: ./data/eval/tac_kbp_2010_surprise_sf_evaluation_queries.xml ./data/training/tac_kbp_2010_surprise_sf_training_queries.xml ./docs/all_files.md5 Paths (relative to the root of the corpus) and md5 checksums for all files included in the package. ./docs/guidelines/TAC_2010_KBP_Assessment_Guidelines_V1.8.pdf ./docs/guidelines/TAC_KBP_2010_Slot_Filling_Annotation_Guidelines_Surprise_Task_V2.5.pdf The guidelines used by annotators in developing the 2010 Surprise Slot Filling queries, manual runs, and assessments contained in this corpus. ./docs/task_description/KBP2010_TaskDefinition.pdf Task Description for the TAC 2010 Knowledge Base Population evaluation track written by evaluation track coordinators. ./tools/check_kbp_surprise-slot-filling.pl Scorer for 2010 Surprise Slot Filling, as provided to LDC by evaluation track coordinators, with no further testing. ./tools/SFScore.java Validator for 2010 Surprise Slot Filling, as provided to LDC by evaluation track coordinators, with no further testing. 3. Annotation tasks The tasks conducted by LDC annotators in support of Surprise SF included query development, manual run development, slot mapping, and assessment of system- and human-produced responses to queries. Each of these subtasks is explained below. 3.1 Query Development Entities, which are the basis of queries for all SF tracks, were selected based primarily on their level of non-confusability and productivity. A candidate query entity was considered non-confusable if there were one or more references to it in the source corpus that were "canonical", meaning that they were not an alias and, for persons, included more than just a first or last name. Productivity for candidate queries was determined by searching the source corpus to find whether it contained at least two slot fillers (i.e. answers) for the entity. Entities with well-populated Knowledge Base (KB) entries (either in the official TAC KBP KB or in online resources such as Wikipedia) were also generally avoided as query entities. Such entities were dispreferred both to reduce the advantage gained by using online resources and because there was a restriction against returning fillers that were redundant with information already in the official KB. Following initial query development, a quality control pass was conducted to flag any fillers that did not have adequate justification in the source document, or that might be at variance with the guidelines in any way. These flagged fillers were then adjudicated by senior annotators who updated, removed, or replaced them as appropriate. 3.2 Manual Run Development LDC developed "manual runs", or the human-produced set of annotated responses for each of the Surprise SF training and evaluation queries. For each query, annotators were given up to two hours to search the corpus and locate all valid fillers. Following the initial round of annotation for manual runs, a quality control pass was conducted to flag any fillers that might be at variance with the guidelines in any way. These flagged fillers were then adjudicated by senior annotators who updated or removed them as appropriate. 3.2 Slot Mapping A senior annotator performed a slot-mapping process before assessment in order to indicate how existing attribute labels in the KB for non-NIL query entities would map to the set of TAC KBP SF slots. This process was necessary because attribute labels for the same type of information varied widely in Wikipedia (the source of the TAC KBP KB) based on entity type information. For example, the awards won by an actor might be labeled as 'actor-awards-won' while a golfer's could be indicated by 'awards-golfer'. During the slot-mapping process, both of these would be linked to the Surprise SF slot 'per:awards_won'. These mappings were then imported into the assessment tool so that they could be coreferenced with responses marked as correct (with respect to the slot definition), thereby indicating that those responses were redundant with the KB. 3.3 Assessment In assessment, annotators first judged the validity of anonymized human- and system-produced responses returned for the query set and then coreferenced those marked as correct. Fillers were assessed as correct if they were found to be both compatible with the slot descriptions and supported in the text. Fillers were assessed as wrong if they did not meet both of the conditions for correctness, or as inexact if overly insufficient or extraneous text had been selected for an otherwise correct answer. After first passes of assessment were completed, quality control was performed on the data by senior annotators. During quality control, the text extents of annotated fillers were checked for correctness, equivalence classes for entities assessed as correct were checked for accuracy, and potentially problematic assessments were either corrected or flagged for additional review. 4. Using the Data As mentioned in the intro, note that the corresponding source document collection for this release is included in LDC2018T03: TAC KBP Comprehensive English Source Corpora 2009-2014. Also, the corresponding Knowledge Base (KB) for the data - a 2008 snapshot of Wikipedia - can be obtained via LDC2014T16: TAC KBP Reference Knowledge Base. 4.1 Text normalization and offset calculation Text normalization of queries consisting of a 1-for-1 substitution of newline (0x0A) and tab (0x09) characters with space (0x20) characters was performed on the document text input to the response field. The values of the beg and end XML elements in the queries.xml files indicate character offsets to identify text extents in the source. Offset counting starts from the initial character (character 0) of the source document and includes newlines and all markup characters - that is, the offsets are based on treating the source document file as "raw text", with all its markup included. 4.2 Proper ingesting of XML queries While the character offsets are calculated based on treating the source document as "raw text", the "name" strings being referenced by the queries sometimes contain XML metacharacters, and these had to be "re-escaped" for proper inclusion in the queries.xml file. For example, an actual name like "AT&T" may show up a source document file as "AT&T" (because the source document was originally formatted as XML data). But since the source doc is being treated here as raw text, this name string is treated in queries.xml as having 7 characters (i.e., the character offsets, when provided, will point to a string of length 7). However, the "name" element itself, as presented in the queries.xml file, will be even longer - "AT&T" - because the queries.xml file is intended to be handled by an XML parser, which will return "AT&T" when this "name" element is extracted. Using the queries.xml data without XML parsing would yield a mismatch between the "name" value and the corresponding string in the source data. 5. Acknowledgements This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. The authors acknowledge the following contributors to this data set: Dana Fore (LDC) Dave Graff (LDC) Heather Simpson (LDC) Robert Parker (LDC) Heng Ji (RPI) Ralph Grishman (NYU) Hoa Dang (NIST) Boyan Onyshkevych (DARPA) 6. References Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, Joe Ellis. 2010 Overview of the TAC 2010 Knowledge Base Population Track TAC 2010 Workshop: Proceedings of the Third Text Analysis Conference, Gaithersburg, MD, November 15-16 7. Copyright Information (c) 2021 Trustees of the University of Pennsylvania 8. Contact Information For further information about this data release, or the TAC KBP project, contact the following project staff at LDC: Jeremy Getman, Lead Annotator Stephanie Strassel, PI ------------------------------------------------------------------------ README created by Dana Fore on April 1, 2016 updated by Dana Fore on April 7, 2016 updated by Neil Kuster on May 17, 2016 updated by Neil Kuster on September 19, 2016 updated by Joe Ellis on January 3, 2017 updated by Joe Ellis on February 17, 2017