TAC KBP English Event Argument
Training and Evaluation Data 2014-2015
Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel
1. Overview
This package contains training and evaluation data produced in support of
the 2014 TAC KBP English Event Argument Extraction Pilot and Evaluation
tasks and the 2015 English Event Argument Extraction and Linking Training
and Evaluation tasks.
Text Analysis Conference (TAC) is a series of workshops organized by the
National Institute of Standards and Technology (NIST). TAC was developed to
encourage research in natural language processing (NLP) and related
applications by providing a large test collection, common evaluation
procedures, and a forum for researchers to share their results. Through its
various evaluations, the Knowledge Base Population (KBP) track of TAC
encourages the development of systems that can match entities mentioned in
natural texts with those appearing in a knowledge base and extract novel
information about entities from a document collection and add it to a new
or existing knowledge base.
The Event Argument Extraction and Linking task requires systems to extract
event arguments (entities or attributes playing a role in an event) from
unstructured text, indicate the role they play in an event, and link the
arguments appearing in the same event to each other. Critically, as the
extracted information must be suitable as input to a knowledge base,
systems construct tuples indicating the event type, the role played by the
entity in the event, and the most canonical mention of the entity itself
from the source document. The event types and roles are drawn from an
externally-specified ontology of 31 event types, which includes financial
transactions, communication events, and attacks. For more information
about Event Argument Extraction and Linking, refer to the track home page
on the NIST TAC website, http://www.nist.gov/tac/.
The data included in this package were originally released by
LDC to TAC KBP coordinators and participants under the following ecorpora
catalog IDs and titles:
LDC2014E20: TAC 2014 KBP Event Argument Extraction Pilot Source Corpus
V1.1
LDC2014E40: TAC 2014 KBP Event Argument Extraction Pilot Assessment
Results V1.1
LDC2014E74: TAC 2014 KBP English Event Argument Extraction Evaluation
Annotations V1.1
LDC2014E88: TAC 2014 KBP English Event Argument Extraction Evaluation
Assessment Results V2.0
LDC2014R43: TAC 2014 KBP English Event Argument Extraction Evaluation
Source Corpus V1.1
LDC2015E22: TAC KBP English Event Argument Extraction Comprehensive
Pilot and Evaluation Data 2014
LDC2015E41: TAC KBP 2015 English Event Argument Linking Training Data
LDC2015E79: TAC KBP 2015 English Event Argument Linking Evaluation
Source Corpus
LDC2015E92: TAC KBP 2015 English Event Argument Linking Evaluation
Manual Run
LDC2015E101: TAC KBP 2015 English Event Argument Linking Evaluation
Assessment Results V2.0
LDC2016E37: TAC KBP English Event Argument Comprehensive Training and
Evaluation Data 2014-2015
Summary of data included in this package (for more details see
./data/{2014,2015}/contents.txt):
EA Extraction and Linking Data Distribution:
+------+------------+-----------+-----------+-------------+---------+
| | | source | manual | | event |
| year | set | documents | responses | assessments | hoppers |
+------+------------+-----------+-----------+-------------+---------+
| 2014 | pilot | 60 | 0 | 32,054 | n/a |
| 2014 | evaluation | 528 | 5,947 | 57,599 | n/a |
| 2015 | training | 55* | 0 | 0 | 599 |
| 2015 | evaluation | 500 | 5,207 | 45,391 | 1,608 |
+------+------------+-----------+-----------+-------------+---------+
*NOTE: the 2015 training source documents are a subset
of the 2014 evaluation source corpus
2. Contents
./README.txt
This file.
./data/{2014,2015}/contents.txt
The data in this package are organized by the year of original release
in order to clarify dependencies, highlight occassional differences in
formats from one year to another, and to increase readability in
documentation. The contents.txt file within each year's root directory
provides a list of the contents for all subdirectories as well as
details about file formats and contents.
./docs/all_files.md5
Paths (relative to the root of the corpus) and md5 checksums for all files
in the package.
./docs/guidelines/*
The guidelines used by annotators for the 2014-2015 Event Argument
manual runs, including the 2015 linking tasks, as well as the
assessment tasks contained in this corpus.
./docs/task_descriptions/*
Task Descriptions for the 2014 Event Argument Extraction and 2015 Event
Argument Extraction and Linking evaluation tracks, written by track
coordinators.
./tools/{2014,2015}/*
Scorers for 2014 and 2015 EA submissions, as provided to LDC by
evaluation track coordinators, with no further testing.
3. Annotation Tasks
In developing data for the 2014 Event Argument Extraction evaluation
track (EAE), annotators extracted event arguments (entities or attributes
playing a role in an event) and information about them from unstructured
text. Event argument tuples indicate the event type, the role played by the
entity in the event, and the most canonical mention of the entity itself
from the source document. In Event Argument Linking (EAL), an extension of
EAE made in 2015, the event argument tuples are further linked with other
tuples into Event Hoppers (a relaxed form of identity coreference for events),
indicating that the tuples played a role in the same event or events. In both
2014 and 2015, data development by LDC for the Event Argument task consisted
of three separate processes - source document selection, manual run development,
and assessment.
3.1 Source Document Selection
Documents serve as queries in EAE/EAL and so the first annotation
task is to perform targeted searches over sets of unreleased documents
in two genres, newswire and discussion forum threads. Documents are
valid if they contain at least one "Actual" mention of one of the specified
event types along with appropriate arguments for the event. "Actual"
events, as defined in EAL, include those that happened in the past or
those that are ongoing in the present. Data scouts search primarily for
documents with a variety of event types, though documents providing
mentions of generally less common event types are also selected. For
each document reviewed, a tally of the number of unique event mentions
for each event type is created in order to ensure that all of the
targeted event types are reasonably well-represented. While performing
document reviews, annotators also search for and flag documents with
undesirable qualities (e.g., discussion forum threads with more than a
small amount of newswire quotation) in order to maximize informal content.
3.2 Manual Run Development
In the manual run for the 2014 EAE evaluation, an annotator had a
maximum of thirty minutes per document to annotate one mention of
all valid, unique event arguments. In 2015, LDC performed the manual
run over a 300-document subset of the 500-document source corpus used
by systems in the evaluation. This 300-document subset was selected
using the event tallies produced during document selection to maximize
and balance event coverage. Priority was given to keeping the event
types mixed and ensuring that each event type was still represented
at least 10 times per genre across the 300-document sub-corpus used
for the manual run. For each document in the 2015 EAL evaluation,
annotators had a maximum of sixty minutes to both annotate one mention
of each unique event argument and to cluster all arguments into event
hoppers.
Following the initial rounds of annotation, quality control (QC) passes
are conducted over the manual run data to flag any event arguments or
linking decisions that do not have adequate justification in the source
document, or that might be at variance with the current guidelines.
These flagged annotations were then adjudicated by senior annotators.
3.3 Assessment
For the assessment stage, the 2014 EAE task consisted of entity
coreference and response assessment, which for 2015 EAL was extended
to include a subsequent argument linking subtask. Each assessor
received training which refined their assessments to match the level
of gold standard annotations produced by senior annotators. The first
step in each EAL assessment kit was to perform entity coreference on
all responses returned by systems and LDC for a given document. This
included correct responses, inexact responses and wrong responses.
Following the completion of entity coreference, assessors moved on
to response assessment.
In response assessment, assessors made six judgments on each response
generated. First, the four parts of a response - event type, argument
role (the role that a response played in its matched event), base filler
(the mention of the argument included in the justification) and
canonical argument string (the 'most complete' mention of the argument
from the document) - were all marked as 'correct' if they were found to
be supported in the sources and in-line with the definition of the
relevant event and argument role. Responses were considered 'wrong' if
they did not meet both of the conditions for correctness and 'inexact'
if insufficient justification was provided or extraneous text was
selected for an otherwise correct response. Additionally, each response
was given a 'realis' label, which indicated the modality of the assessed
event argument ('Actual' if the event clearly occurred in the past,
'Generic' if the event was generic in nature - e.g. "I go to the store on
Sundays", and 'Other' if the event could not neatly be described as one
of the other two categories). Lastly, assessors also marked the canonical
argument strings as either 'name' or 'nominal' to indicate the type of
mention.
After response assessment was completed, QC was performed on the data.
Senior annotators reviewed the work of assessors and made corrections to
assessment kits and, for each correction that was made, the reviewer
followed up with the original assessor to clarify the correction. For
certain classes of potential errors, BBN produced automated reports for
senior annotators to review while performing QC. The following classes
of potential errors were reported: possible inconsistencies in the
handling of inexact namestrings in the coreference and assessment stages;
pronouns or nominal phrases being assessed as 'correct' when a name for the
same entity was potentially available; and cases of entity mention overlap
(e.g. "Central Park" and "a fountain in Central Park") where the difficulty
in granularity may indicate a problem with coreference.
In 2015, following the completion of QC for a given document, the senior
annotator who had performed the QC for that document then performed the
document's argument linking step as well, which was comprised of deciding
how correct and inexact responses should be grouped together in event
hoppers. If two or more event arguments were judged by the annotator to
be arguments of the same event, then the arguments were placed into the
same event hopper, a less strict concept than true event coreference,
wherein considerations such as realis aren't necessarily a factor. For
instance, the arguments associated with a future event mention and those
associated with a past event mention might be placed in the same event
hopper, if the event mentions occurred in forum posts written at different
times (one prior to the event in question and one following that same event)
and the annotator reviewing the arguments judged the events referenced to be
the same, despite the future event arguments having 'Other' realis and the
past event arguments having 'Actual' realis.
4. Newswire Data
Newswire data use the following markup framework:
...
...
" tags (depending on whether or not the "doc_type_label" is "story"). All the newswire files are parseable as XML. 5. Discussion Forum Data Discussion Forum files use the following markup framework, in which there may also be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "......
elements). When taken from a short thread, a MPDF may comprise the entire thread. However, when taken from longer threads, a MPDF is a truncated version of its source, though it will always start with the preliminary post. 40 of the MPDF files contain a total of 265 characters in the range U+0085 - U+0099; these officially fall into a category of invisible "control" characters, but they all originated from single-byte "special punctuation" marks (quotes, etc. from CP1252) that have been incorrectly transcoded to utf8. The MPDF files use the following markup framework, in which there may also be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "..." anchor tags): All the MPDF files are parseable as XML. 7. Acknowledgments This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authoized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. The authors acknowledge the following contributors to this data set: Dave Graff (LDC) Marjorie Freedman (BBN) Ryan Gabbard (BBN) Hoa Dang (NIST) Boyan Onyshkevych (DARPA) 8. References Joe Ellis, Jeremy Getman, Dana Fore, Neil Kuster, Zhiyi Song, Ann Bies, Stephanie Strassel. 2015 Overview of Linguistic Resources for the TAC KBP 2015 Evaluations: Methodologies and Results https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp2015_overview.pdf TAC KBP 2015 Workshop: National Institute of Standards and Technology, Gaithersburg, Maryland, November 16-17 Joe Ellis, Jeremy Getman, Stephanie M. Strassel. 2014 Overview of Linguistic Resources for the TAC KBP 2014 Evaluations: Planning, Execution, and Results https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-2014-overview.pdf TAC KBP 2014 Workshop: National Institute of Standards and Technology, Gaithersburg, Maryland, November 17-18 9. Copyright Information (c) 2020 Trustees of the University of Pennsylvania 10. Contact Information For further information about this data release, or the TAC 2014 KBP project, contact the following project staff at LDC: Jeremy Getman, Lead Annotator ... ... .........Stephanie Strassel, PI -------------------------------------------------------------------------- README created by Neil Kuster on February 17, 2016 updated by Neil Kuster on March 18, 2016 updated by Jeremy Getman on March 18, 2016 updated by Neil Kuster on April 21, 2016 updated by Jeremy Getman on April 22, 2016 updated by Joe Ellis on September 8, 2016 updated by Jeremy Getman on September 14, 2016 updated by Joe Ellis on September 16, 2016 updated by Jeremy Getman on September 19, 2016