TAC KBP English Event Argument Training and Evaluation Data 2014-2015 Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel 1. Overview This package contains training and evaluation data produced in support of the 2014 TAC KBP English Event Argument Extraction Pilot and Evaluation tasks and the 2015 English Event Argument Extraction and Linking Training and Evaluation tasks. Text Analysis Conference (TAC) is a series of workshops organized by the National Institute of Standards and Technology (NIST). TAC was developed to encourage research in natural language processing (NLP) and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. Through its various evaluations, the Knowledge Base Population (KBP) track of TAC encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base. The Event Argument Extraction and Linking task requires systems to extract event arguments (entities or attributes playing a role in an event) from unstructured text, indicate the role they play in an event, and link the arguments appearing in the same event to each other. Critically, as the extracted information must be suitable as input to a knowledge base, systems construct tuples indicating the event type, the role played by the entity in the event, and the most canonical mention of the entity itself from the source document. The event types and roles are drawn from an externally-specified ontology of 31 event types, which includes financial transactions, communication events, and attacks. For more information about Event Argument Extraction and Linking, refer to the track home page on the NIST TAC website, http://www.nist.gov/tac/. The data included in this package were originally released by LDC to TAC KBP coordinators and participants under the following ecorpora catalog IDs and titles: LDC2014E20: TAC 2014 KBP Event Argument Extraction Pilot Source Corpus V1.1 LDC2014E40: TAC 2014 KBP Event Argument Extraction Pilot Assessment Results V1.1 LDC2014E74: TAC 2014 KBP English Event Argument Extraction Evaluation Annotations V1.1 LDC2014E88: TAC 2014 KBP English Event Argument Extraction Evaluation Assessment Results V2.0 LDC2014R43: TAC 2014 KBP English Event Argument Extraction Evaluation Source Corpus V1.1 LDC2015E22: TAC KBP English Event Argument Extraction Comprehensive Pilot and Evaluation Data 2014 LDC2015E41: TAC KBP 2015 English Event Argument Linking Training Data LDC2015E79: TAC KBP 2015 English Event Argument Linking Evaluation Source Corpus LDC2015E92: TAC KBP 2015 English Event Argument Linking Evaluation Manual Run LDC2015E101: TAC KBP 2015 English Event Argument Linking Evaluation Assessment Results V2.0 LDC2016E37: TAC KBP English Event Argument Comprehensive Training and Evaluation Data 2014-2015 Summary of data included in this package (for more details see ./data/{2014,2015}/contents.txt): EA Extraction and Linking Data Distribution: +------+------------+-----------+-----------+-------------+---------+ | | | source | manual | | event | | year | set | documents | responses | assessments | hoppers | +------+------------+-----------+-----------+-------------+---------+ | 2014 | pilot | 60 | 0 | 32,054 | n/a | | 2014 | evaluation | 528 | 5,947 | 57,599 | n/a | | 2015 | training | 55* | 0 | 0 | 599 | | 2015 | evaluation | 500 | 5,207 | 45,391 | 1,608 | +------+------------+-----------+-----------+-------------+---------+ *NOTE: the 2015 training source documents are a subset of the 2014 evaluation source corpus 2. Contents ./README.txt This file. ./data/{2014,2015}/contents.txt The data in this package are organized by the year of original release in order to clarify dependencies, highlight occassional differences in formats from one year to another, and to increase readability in documentation. The contents.txt file within each year's root directory provides a list of the contents for all subdirectories as well as details about file formats and contents. ./docs/all_files.md5 Paths (relative to the root of the corpus) and md5 checksums for all files in the package. ./docs/guidelines/* The guidelines used by annotators for the 2014-2015 Event Argument manual runs, including the 2015 linking tasks, as well as the assessment tasks contained in this corpus. ./docs/task_descriptions/* Task Descriptions for the 2014 Event Argument Extraction and 2015 Event Argument Extraction and Linking evaluation tracks, written by track coordinators. ./tools/{2014,2015}/* Scorers for 2014 and 2015 EA submissions, as provided to LDC by evaluation track coordinators, with no further testing. 3. Annotation Tasks In developing data for the 2014 Event Argument Extraction evaluation track (EAE), annotators extracted event arguments (entities or attributes playing a role in an event) and information about them from unstructured text. Event argument tuples indicate the event type, the role played by the entity in the event, and the most canonical mention of the entity itself from the source document. In Event Argument Linking (EAL), an extension of EAE made in 2015, the event argument tuples are further linked with other tuples into Event Hoppers (a relaxed form of identity coreference for events), indicating that the tuples played a role in the same event or events. In both 2014 and 2015, data development by LDC for the Event Argument task consisted of three separate processes - source document selection, manual run development, and assessment. 3.1 Source Document Selection Documents serve as queries in EAE/EAL and so the first annotation task is to perform targeted searches over sets of unreleased documents in two genres, newswire and discussion forum threads. Documents are valid if they contain at least one "Actual" mention of one of the specified event types along with appropriate arguments for the event. "Actual" events, as defined in EAL, include those that happened in the past or those that are ongoing in the present. Data scouts search primarily for documents with a variety of event types, though documents providing mentions of generally less common event types are also selected. For each document reviewed, a tally of the number of unique event mentions for each event type is created in order to ensure that all of the targeted event types are reasonably well-represented. While performing document reviews, annotators also search for and flag documents with undesirable qualities (e.g., discussion forum threads with more than a small amount of newswire quotation) in order to maximize informal content. 3.2 Manual Run Development In the manual run for the 2014 EAE evaluation, an annotator had a maximum of thirty minutes per document to annotate one mention of all valid, unique event arguments. In 2015, LDC performed the manual run over a 300-document subset of the 500-document source corpus used by systems in the evaluation. This 300-document subset was selected using the event tallies produced during document selection to maximize and balance event coverage. Priority was given to keeping the event types mixed and ensuring that each event type was still represented at least 10 times per genre across the 300-document sub-corpus used for the manual run. For each document in the 2015 EAL evaluation, annotators had a maximum of sixty minutes to both annotate one mention of each unique event argument and to cluster all arguments into event hoppers. Following the initial rounds of annotation, quality control (QC) passes are conducted over the manual run data to flag any event arguments or linking decisions that do not have adequate justification in the source document, or that might be at variance with the current guidelines. These flagged annotations were then adjudicated by senior annotators. 3.3 Assessment For the assessment stage, the 2014 EAE task consisted of entity coreference and response assessment, which for 2015 EAL was extended to include a subsequent argument linking subtask. Each assessor received training which refined their assessments to match the level of gold standard annotations produced by senior annotators. The first step in each EAL assessment kit was to perform entity coreference on all responses returned by systems and LDC for a given document. This included correct responses, inexact responses and wrong responses. Following the completion of entity coreference, assessors moved on to response assessment. In response assessment, assessors made six judgments on each response generated. First, the four parts of a response - event type, argument role (the role that a response played in its matched event), base filler (the mention of the argument included in the justification) and canonical argument string (the 'most complete' mention of the argument from the document) - were all marked as 'correct' if they were found to be supported in the sources and in-line with the definition of the relevant event and argument role. Responses were considered 'wrong' if they did not meet both of the conditions for correctness and 'inexact' if insufficient justification was provided or extraneous text was selected for an otherwise correct response. Additionally, each response was given a 'realis' label, which indicated the modality of the assessed event argument ('Actual' if the event clearly occurred in the past, 'Generic' if the event was generic in nature - e.g. "I go to the store on Sundays", and 'Other' if the event could not neatly be described as one of the other two categories). Lastly, assessors also marked the canonical argument strings as either 'name' or 'nominal' to indicate the type of mention. After response assessment was completed, QC was performed on the data. Senior annotators reviewed the work of assessors and made corrections to assessment kits and, for each correction that was made, the reviewer followed up with the original assessor to clarify the correction. For certain classes of potential errors, BBN produced automated reports for senior annotators to review while performing QC. The following classes of potential errors were reported: possible inconsistencies in the handling of inexact namestrings in the coreference and assessment stages; pronouns or nominal phrases being assessed as 'correct' when a name for the same entity was potentially available; and cases of entity mention overlap (e.g. "Central Park" and "a fountain in Central Park") where the difficulty in granularity may indicate a problem with coreference. In 2015, following the completion of QC for a given document, the senior annotator who had performed the QC for that document then performed the document's argument linking step as well, which was comprised of deciding how correct and inexact responses should be grouped together in event hoppers. If two or more event arguments were judged by the annotator to be arguments of the same event, then the arguments were placed into the same event hopper, a less strict concept than true event coreference, wherein considerations such as realis aren't necessarily a factor. For instance, the arguments associated with a future event mention and those associated with a past event mention might be placed in the same event hopper, if the event mentions occurred in forum posts written at different times (one prior to the event in question and one following that same event) and the annotator reviewing the arguments judged the events referenced to be the same, despite the future event arguments having 'Other' realis and the past event arguments having 'Actual' realis. 4. Newswire Data Newswire data use the following markup framework: ... ...

...

...
where the HEADLINE and DATELINE tags are optional (not always present), and the TEXT content may or may not include "

...

" tags (depending on whether or not the "doc_type_label" is "story"). All the newswire files are parseable as XML. 5. Discussion Forum Data Discussion Forum files use the following markup framework, in which there may also be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "..." anchor tags): ... ... ... ... ... Additionally, each unit contains at least five post elements. All the discussion forum files are parseable as XML. 6. Multi-Post Discussion Forum Data Multi-Post Discussion Forum files (MPDFs) are derived from English Discussion Forum threads. They consist of a continuous run of posts from a thread but they are only approximately 800 words in length (excluding metadata and text within elements). When taken from a short thread, a MPDF may comprise the entire thread. However, when taken from longer threads, a MPDF is a truncated version of its source, though it will always start with the preliminary post. 40 of the MPDF files contain a total of 265 characters in the range U+0085 - U+0099; these officially fall into a category of invisible "control" characters, but they all originated from single-byte "special punctuation" marks (quotes, etc. from CP1252) that have been incorrectly transcoded to utf8. The MPDF files use the following markup framework, in which there may also be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "..." anchor tags): ... ... ... ... ... All the MPDF files are parseable as XML. 7. Acknowledgments This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authoized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. The authors acknowledge the following contributors to this data set: Dave Graff (LDC) Marjorie Freedman (BBN) Ryan Gabbard (BBN) Hoa Dang (NIST) Boyan Onyshkevych (DARPA) 8. References Joe Ellis, Jeremy Getman, Dana Fore, Neil Kuster, Zhiyi Song, Ann Bies, Stephanie Strassel. 2015 Overview of Linguistic Resources for the TAC KBP 2015 Evaluations: Methodologies and Results https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp2015_overview.pdf TAC KBP 2015 Workshop: National Institute of Standards and Technology, Gaithersburg, Maryland, November 16-17 Joe Ellis, Jeremy Getman, Stephanie M. Strassel. 2014 Overview of Linguistic Resources for the TAC KBP 2014 Evaluations: Planning, Execution, and Results https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-2014-overview.pdf TAC KBP 2014 Workshop: National Institute of Standards and Technology, Gaithersburg, Maryland, November 17-18 9. Copyright Information (c) 2020 Trustees of the University of Pennsylvania 10. Contact Information For further information about this data release, or the TAC 2014 KBP project, contact the following project staff at LDC: Jeremy Getman, Lead Annotator Stephanie Strassel, PI -------------------------------------------------------------------------- README created by Neil Kuster on February 17, 2016 updated by Neil Kuster on March 18, 2016 updated by Jeremy Getman on March 18, 2016 updated by Neil Kuster on April 21, 2016 updated by Jeremy Getman on April 22, 2016 updated by Joe Ellis on September 8, 2016 updated by Jeremy Getman on September 14, 2016 updated by Joe Ellis on September 16, 2016 updated by Jeremy Getman on September 19, 2016