TAC KBP Event Argument Comprehensive Training and Evaluation Data 2016-2017 Authors: Joe Ellis, Jeremy Getman, Zhiyi Song, Stephanie Strassel 1. Overview This package contains training and evaluation data produced in support of the 2016 TAC KBP Event Argument Linking Pilot and Evaluation tasks and the 2017 TAC KBP Event Argument Linking Evaluation task. Text Analysis Conference (TAC) is a series of workshops organized by the National Institute of Standards and Technology (NIST). TAC was developed to encourage research in natural language processing (NLP) and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. Through its various evaluations, the Knowledge Base Population (KBP) track of TAC encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base. The Event Argument Extraction and Linking task requires systems to extract mentions of entities from unstructured text, indicate the role they play in an event, and link the arguments appearing in the same event to each other. Critically, as the extracted information must be suitable as input to a knowledge base, systems construct tuples indicating the event type, the role played by the entity in the event, and the most canonical mention of the entity itself from the source document. The event types and roles are drawn from an externally-specified ontology of 31 event types, which includes financial transactions, communication events, and attacks. For more information about Event Argument Extraction and Linking, refer to the track home page on the NIST TAC website, http://www.nist.gov/tac/ Source documents referenced by the files in this package are available separately in LDC2019T12 TAC KBP Evaluation Source Corpora 2016-2017. The data included in this package were originally released by LDC to TAC KBP coordinators and participants under the following ecorpora catalog IDs and titles: LDC2016E107: TAC KBP 2016 English Event Argument Linking Evaluation Assessment Results V2.0 LDC2016E49: TAC KBP 2016 English Event Argument Linking Pilot Source Corpus LDC2016E51: TAC KBP 2016 English Event Argument Linking Pilot Queries and Manual Run V1.1 LDC2016E59: TAC KBP 2016 English Event Argument Linking Pilot Assessment Results V1.1 LDC2016E60: TAC KBP 2016 English Event Argument Linking Pilot Gold Standard LDC2016E73: TAC KBP 2016 Eval Core Set Rich ERE Annotation with Augmented Event Argument v2 LDC2016E74: TAC KBP 2016 English Event Argument Linking Evaluation Queries and Manual Run LDC2017E55: TAC KBP 2017 Eval Core Set Rich ERE Annotation with Augmented Event Arguments Summary of data included in this package (for more details see ./data/{2016,2017}/contents.txt): EA Extraction and Linking Data Distribution: +------+------------+-----------+-----------+-------------+----------+---------+---------+ | | | | cross-doc | | | | | | | | source | manual | | | | event | | year | set | documents | responses | assessments | entities | fillers | hoppers | +------+------------+-----------+-----------+-------------+----------+---------+---------+ | 2016 | pilot | 2092 | 98 | 2689 | 2,923 | 1,308 | 1,500 | | 2016 | evaluation | 0* | 628 | 7697 | 17,681 | 4,544 | 6,799 | | 2017 | evaluation | 0* | 0 | 0 | 17,896 | 5,995 | 8,022 | +------+------------+-----------+-----------+-------------+----------+---------+---------+ *source corpora for the 2016 and 2017 evals are available separately; see above 2. Contents ./docs/README.txt This file. ./data/{2016,2017}/contents.txt The data in this package are organized by the year of original release in order to clarify dependencies, highlight occassional differences in formats from one year to another, and to increase readability in documentation. The contents.txt file within each year's root directory provides a list of the contents for all subdirectories as well as specific details about file formats and contents. ./docs/2016/TAC_KBP_Event_Argument_Query_Development_and_Manual_Run_Guidelines_V1.2.pdf ./docs/2016/TAC_KBP_Event_Argument_Assessment_Guidelines_V1.4.pdf The guidelines used by annotators in developing the queries and manual run for the cross-document components of the 2016 Event Argument pilot and 2016 Event Argument evaluation, as well as the guidelines used by assessors during the cross-document assessment phases of the 2016 pilot and 2016 eval. ./docs/2017/EventArgumentAugmentationGuidelines.pdf The guidelines used by LDC annotators in creating the within-document gold standard ERE data produced for the 2017 Event Argument evaluation. ./dtd/2016/deft_rich_ere_augmentation.1.0.dtd DTD for all ERE xml files found in ./data/2016/ ./dtd/2016/2016_event_argument_expanded_queries.dtd The DTD for: tac_kbp_2016_english_event_argument_linking_evaluation_ldc-queries_expanded.xml tac_kbp_2016_english_event_argument_linking_pilot_queries_expanded.xml ./dtd/2017/deft_rich_ere.1.2.dtd DTD for all ERE xml files found in ./data/2017/ 3. Annotation Tasks In developing data for the Event Argument evaluations, annotators extracted event arguments (entities or attributes playing a role in an event) and information about them from unstructured text. For the 2016 and 2017 Event Argument evaluations, event arguments based on ERE (Entities, Relations, and Events) annotation were used as the gold standard against which system output was scored. In 2016, a cross- document event grouping task was also conducted, which utilized manually developed queries. 3.1 Gold Standard Development Entities, Relations, and Events (ERE), an annotation task developed by LDC for DARPA’s Deep Exploration and Filtering of Text program (DEFT), was first conducted in 2013 with the goal of supporting multiple research directions and technology evaluations. As with earlier related efforts like Automated Content Extraction (ACE), ERE exhaustively labels entities, relations and events along with their attributes according to specified taxonomies. As part of an effort to increase coordination across KBP data sets in 2016 and 2017, ERE annotation was performed as an upstream task in LDC's overall KBP data creation pipeline, providing inputs to downstream annotation tasks, including Event Argument. In 2016, following ERE data development, the annotations were augmented by running a script developed by BBN over the data and then having ERE annotators review the results for validity. The purpose of the augmentation pass was to add inferred arguments that are invalid following ERE guidelines and difficult for human annotators to find in general. In large part this translated to arguments that could be inferred by locational containment. For example, a Conflict.Attack event that had Baghdad annotated as the Place of the event might have Iraq added as an additional, inferred Place during the augmentation pass. In 2017, as in 2016, LDC created a set of gold standard annotations based on event annotation in rich ERE. In order to facilitate a more exhaustive augmentation pass in 2017, instead of relying on automatic augmentation, LDC performed manual event argument augmentation to add arguments that were considered valid for the event argument annotation scheme, but not for Rich ERE. Additionally, annotators were asked to add any event arguments considered valid in Rich ERE, but missed during Rich ERE annotation. In 2016, augmentation increased the number of event arguments (compared to ERE without augmentation) by only 6-7%, but in 2017 there was a 42% increase in Chinese, a 53% increase in English and a 61% increase in Spanish, meaning that many more valid event arguments were captured in the 2017 annotation under the manual augmentation process, as compared with the automated method utilized in 2016. 3.2 Cross-Document Queries & Manual Run To support the 2016 cross-document component of Event Argument, annotators selected queries comprised of a single event argument pertaining to an event hopper in the gold standard Event Argument annotations described above. Given the anticipated difficulty of the task for systems, potential queries included only events for which a named event argument had been annotated, were sourced only from English documents (thus the task was English-only), and excluded the 3 new event types added to EA in 2016 (Contact.Contact, Contact.Broadcast, and Transaction.Transaction). Annotators also were instructed to limit potential queries to event arguments that indicated relatively simple, low-granularity event hoppers. Queries were also required to be productive, with the event indicated by the query occurring in at least 5-10 documents in the English portion of the source corpus. Some less-productive queries were included as well, however, in order to ensure that rarer or more difficult event types were represented in the query set. LDC also produced an exhaustive manual run for queries, which was performed over the entirety of the English portion of the TAC KBP evaluation source corpus. A response for the manual run consisted of justification strings containing whatever portion or portions of a document was needed to prove that the event indicated by the relevant query occurred in the given document. A document could be returned more than once, if each instance was in response to a different query. 3.3 Cross-Document Assessment For the cross-doc assessment portion of 2016 Event Argument, annotators reviewed all of the responses to both queries manually selected by LDC, as well as a set of derived queries generated from system responses. After the cross document Event Argument evaluation was conducted, it was discovered that, despite the efforts taken to produce relatively simple queries, systems were largely unsuccessful in finding the entry points indicated by the queries. As such, an additional set of "derived" queries were produced by BBN from systems' responses in an effort to better measure precision given low system recall. For these queries, LDC did not produce a cross-doc manual run. During assessment, assessors reviewed each response individually, and decided whether or not the response's justification proved that a document contained an instance of the event indicated by the relevant query. If the assessor determined that the response did indeed reference the same event, that response was marked CORRECT. If the response was determined to contain an event of the same type as the query event, but not the query event itself, the response was marked ET_MATCH (event type match). If the response was judged to contain neither the query event nor some other event of the same type, the response was marked WRONG. (Note that the 2017 evaluation had no cross-document component. As such, sections 3.2-3.3 pertain only to Event Argument data developed in 2016.) 4. Newswire Data Newswire data use the following markup framework: ... ...

...

...
where the HEADLINE and DATELINE tags are optional (not always present), and the TEXT content may or may not include "

...

" tags (depending on whether or not the "doc_type_label" is "story"). All the newswire files are parseable as XML. 5. Multi-Post Discussion Forum Data Multi-Post Discussion Forum files (MPDFs) are derived from English Discussion Forum threads. They consist of a continuous run of posts from a thread but they are only approximately 800 words in length (excluding metadata and text within elements). When taken from a short thread, a MPDF may comprise the entire thread. However, when taken from longer threads, a MPDF is a truncated version of its source, though it will always start with the preliminary post. 40 of the MPDF files contain a total of 265 characters in the range U+0085 - U+0099; these officially fall into a category of invisible "control" characters, but they all originated from single-byte "special punctuation" marks (quotes, etc. from CP1252) that have been incorrectly transcoded to utf8. The MPDF files use the following markup framework, in which there may also be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "..." anchor tags): ... ... ... ... ... All the MPDF files are parseable as XML. 6. Acknowledgments This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. The authors acknowledge the following contributors to this data set: Dave Graff (LDC) Marjorie Freedman (BBN) Ryan Gabbard (BBN) Hoa Dang (NIST) Boyan Onyshkevych (DARPA) 7. References Joe Ellis, Jeremy Getman, Neil Kuster, Zhiyi Song, Ann Bies, & Stephanie M. Strassel. 2016 Overview of Linguistic Resources for the TAC KBP 2016 Evaluations: Methodologies and Results TAC KBP 2016 Workshop: National Institute of Standards and Technology, Gaithersburg, MD, November 14-15 Jeremy Getman, Joe Ellis, Zhiyi Song, Jennifer Tracey, & Stephanie M. Strassel. 2017 Overview of Linguistic Resources for the TAC KBP 2017 Evaluations: Methodologies and Results TAC KBP 2017 Workshop: National Institute of Standards and Technology, Gaithersburg, MD, November 13-14 8. Copyright Information (c) 2020 Trustees of the University of Pennsylvania 9. Contact Information For further information about this data release, or the TAC KBP project, contact the following project staff at LDC: Jeremy Getman, Project Manager Stephanie Strassel, PI -------------------------------------------------------------------------- README created by Joseph Carlough on March 28, 2018 updated by Jeremy Getman on May 11, 2018 updated by Jeremy Getman on October 17, 2018 updated by Jeremy Getman on April 9, 2019