TAC KBP Event Argument
Comprehensive Training and Evaluation Data 2016-2017
Authors: Joe Ellis, Jeremy Getman, Zhiyi Song, Stephanie Strassel
1. Overview
This package contains training and evaluation data produced in support of
the 2016 TAC KBP Event Argument Linking Pilot and Evaluation tasks
and the 2017 TAC KBP Event Argument Linking Evaluation task.
Text Analysis Conference (TAC) is a series of workshops organized by the
National Institute of Standards and Technology (NIST). TAC was developed to
encourage research in natural language processing (NLP) and related
applications by providing a large test collection, common evaluation
procedures, and a forum for researchers to share their results. Through its
various evaluations, the Knowledge Base Population (KBP) track of TAC
encourages the development of systems that can match entities mentioned in
natural texts with those appearing in a knowledge base and extract novel
information about entities from a document collection and add it to a new
or existing knowledge base.
The Event Argument Extraction and Linking task requires systems to
extract mentions of entities from unstructured text, indicate the
role they play in an event, and link the arguments appearing in the
same event to each other. Critically, as the extracted information must
be suitable as input to a knowledge base, systems construct tuples
indicating the event type, the role played by the entity in the event,
and the most canonical mention of the entity itself from the source
document. The event types and roles are drawn from an externally-specified
ontology of 31 event types, which includes financial transactions,
communication events, and attacks. For more information about Event
Argument Extraction and Linking, refer to the track home page on the NIST
TAC website, http://www.nist.gov/tac/
Source documents referenced by the files in this package are available
separately in LDC2019T12 TAC KBP Evaluation Source Corpora 2016-2017.
The data included in this package were originally released by
LDC to TAC KBP coordinators and participants under the following
ecorpora catalog IDs and titles:
LDC2016E107: TAC KBP 2016 English Event Argument Linking Evaluation
Assessment Results V2.0
LDC2016E49: TAC KBP 2016 English Event Argument Linking Pilot Source
Corpus
LDC2016E51: TAC KBP 2016 English Event Argument Linking Pilot Queries
and Manual Run V1.1
LDC2016E59: TAC KBP 2016 English Event Argument Linking Pilot Assessment
Results V1.1
LDC2016E60: TAC KBP 2016 English Event Argument Linking Pilot Gold
Standard
LDC2016E73: TAC KBP 2016 Eval Core Set Rich ERE Annotation with Augmented
Event Argument v2
LDC2016E74: TAC KBP 2016 English Event Argument Linking Evaluation Queries
and Manual Run
LDC2017E55: TAC KBP 2017 Eval Core Set Rich ERE Annotation with Augmented
Event Arguments
Summary of data included in this package (for more details see
./data/{2016,2017}/contents.txt):
EA Extraction and Linking Data Distribution:
+------+------------+-----------+-----------+-------------+----------+---------+---------+
| | | | cross-doc | | | | |
| | | source | manual | | | | event |
| year | set | documents | responses | assessments | entities | fillers | hoppers |
+------+------------+-----------+-----------+-------------+----------+---------+---------+
| 2016 | pilot | 2092 | 98 | 2689 | 2,923 | 1,308 | 1,500 |
| 2016 | evaluation | 0* | 628 | 7697 | 17,681 | 4,544 | 6,799 |
| 2017 | evaluation | 0* | 0 | 0 | 17,896 | 5,995 | 8,022 |
+------+------------+-----------+-----------+-------------+----------+---------+---------+
*source corpora for the 2016 and 2017 evals are available separately; see above
2. Contents
./docs/README.txt
This file.
./data/{2016,2017}/contents.txt
The data in this package are organized by the year of original release
in order to clarify dependencies, highlight occassional differences in
formats from one year to another, and to increase readability in
documentation. The contents.txt file within each year's root directory
provides a list of the contents for all subdirectories as well as
specific details about file formats and contents.
./docs/2016/TAC_KBP_Event_Argument_Query_Development_and_Manual_Run_Guidelines_V1.2.pdf
./docs/2016/TAC_KBP_Event_Argument_Assessment_Guidelines_V1.4.pdf
The guidelines used by annotators in developing the queries and manual
run for the cross-document components of the 2016 Event Argument pilot
and 2016 Event Argument evaluation, as well as the guidelines used by
assessors during the cross-document assessment phases of the 2016 pilot
and 2016 eval.
./docs/2017/EventArgumentAugmentationGuidelines.pdf
The guidelines used by LDC annotators in creating the within-document
gold standard ERE data produced for the 2017 Event Argument evaluation.
./dtd/2016/deft_rich_ere_augmentation.1.0.dtd
DTD for all ERE xml files found in ./data/2016/
./dtd/2016/2016_event_argument_expanded_queries.dtd
The DTD for:
tac_kbp_2016_english_event_argument_linking_evaluation_ldc-queries_expanded.xml
tac_kbp_2016_english_event_argument_linking_pilot_queries_expanded.xml
./dtd/2017/deft_rich_ere.1.2.dtd
DTD for all ERE xml files found in ./data/2017/
3. Annotation Tasks
In developing data for the Event Argument evaluations, annotators
extracted event arguments (entities or attributes playing a role in
an event) and information about them from unstructured text. For the
2016 and 2017 Event Argument evaluations, event arguments based on ERE
(Entities, Relations, and Events) annotation were used as the gold
standard against which system output was scored. In 2016, a cross-
document event grouping task was also conducted, which utilized
manually developed queries.
3.1 Gold Standard Development
Entities, Relations, and Events (ERE), an annotation task developed
by LDC for DARPA’s Deep Exploration and Filtering of Text program
(DEFT), was first conducted in 2013 with the goal of supporting
multiple research directions and technology evaluations. As with
earlier related efforts like Automated Content Extraction (ACE), ERE
exhaustively labels entities, relations and events along with their
attributes according to specified taxonomies. As part of an effort to
increase coordination across KBP data sets in 2016 and 2017, ERE
annotation was performed as an upstream task in LDC's overall KBP data
creation pipeline, providing inputs to downstream annotation tasks,
including Event Argument.
In 2016, following ERE data development, the annotations were augmented
by running a script developed by BBN over the data and then having ERE
annotators review the results for validity. The purpose of the
augmentation pass was to add inferred arguments that are invalid
following ERE guidelines and difficult for human annotators to find
in general. In large part this translated to arguments that could be
inferred by locational containment. For example, a Conflict.Attack
event that had Baghdad annotated as the Place of the event might
have Iraq added as an additional, inferred Place during the
augmentation pass.
In 2017, as in 2016, LDC created a set of gold standard annotations
based on event annotation in rich ERE. In order to facilitate a more
exhaustive augmentation pass in 2017, instead of relying on automatic
augmentation, LDC performed manual event argument augmentation to add
arguments that were considered valid for the event argument annotation
scheme, but not for Rich ERE. Additionally, annotators were asked to
add any event arguments considered valid in Rich ERE, but missed during
Rich ERE annotation. In 2016, augmentation increased the number of event
arguments (compared to ERE without augmentation) by only 6-7%, but in
2017 there was a 42% increase in Chinese, a 53% increase in English and
a 61% increase in Spanish, meaning that many more valid event arguments
were captured in the 2017 annotation under the manual augmentation
process, as compared with the automated method utilized in 2016.
3.2 Cross-Document Queries & Manual Run
To support the 2016 cross-document component of Event Argument,
annotators selected queries comprised of a single event argument
pertaining to an event hopper in the gold standard Event Argument
annotations described above. Given the anticipated difficulty of the
task for systems, potential queries included only events for which a
named event argument had been annotated, were sourced only from
English documents (thus the task was English-only), and excluded the
3 new event types added to EA in 2016 (Contact.Contact,
Contact.Broadcast, and Transaction.Transaction). Annotators also were
instructed to limit potential queries to event arguments that
indicated relatively simple, low-granularity event hoppers.
Queries were also required to be productive, with the event indicated
by the query occurring in at least 5-10 documents in the English
portion of the source corpus. Some less-productive queries were
included as well, however, in order to ensure that rarer or more
difficult event types were represented in the query set.
LDC also produced an exhaustive manual run for queries, which was
performed over the entirety of the English portion of the TAC KBP
evaluation source corpus. A response for the manual run consisted of
justification strings containing whatever portion or portions of a
document was needed to prove that the event indicated by the relevant
query occurred in the given document. A document could be returned
more than once, if each instance was in response to a different query.
3.3 Cross-Document Assessment
For the cross-doc assessment portion of 2016 Event Argument, annotators
reviewed all of the responses to both queries manually selected by LDC,
as well as a set of derived queries generated from system responses.
After the cross document Event Argument evaluation was conducted, it was
discovered that, despite the efforts taken to produce relatively simple
queries, systems were largely unsuccessful in finding the entry points
indicated by the queries. As such, an additional set of "derived"
queries were produced by BBN from systems' responses in an effort to
better measure precision given low system recall. For these queries, LDC
did not produce a cross-doc manual run.
During assessment, assessors reviewed each response individually, and
decided whether or not the response's justification proved that a
document contained an instance of the event indicated by the relevant
query. If the assessor determined that the response did indeed reference
the same event, that response was marked CORRECT. If the response was
determined to contain an event of the same type as the query event, but
not the query event itself, the response was marked ET_MATCH (event type
match). If the response was judged to contain neither the query event
nor some other event of the same type, the response was marked WRONG.
(Note that the 2017 evaluation had no cross-document component. As such,
sections 3.2-3.3 pertain only to Event Argument data developed in 2016.)
4. Newswire Data
Newswire data use the following markup framework:
...
...
" tags (depending on whether or not the "doc_type_label" is "story"). All the newswire files are parseable as XML. 5. Multi-Post Discussion Forum Data Multi-Post Discussion Forum files (MPDFs) are derived from English Discussion Forum threads. They consist of a continuous run of posts from a thread but they are only approximately 800 words in length (excluding metadata and text withinelements). When taken from a short thread, a MPDF may comprise the entire thread. However, when taken from longer threads, a MPDF is a truncated version of its source, though it will always start with the preliminary post. 40 of the MPDF files contain a total of 265 characters in the range U+0085 - U+0099; these officially fall into a category of invisible "control" characters, but they all originated from single-byte "special punctuation" marks (quotes, etc. from CP1252) that have been incorrectly transcoded to utf8. The MPDF files use the following markup framework, in which there may also be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "..." anchor tags): All the MPDF files are parseable as XML. 6. Acknowledgments This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. The authors acknowledge the following contributors to this data set: Dave Graff (LDC) Marjorie Freedman (BBN) Ryan Gabbard (BBN) Hoa Dang (NIST) Boyan Onyshkevych (DARPA) 7. References Joe Ellis, Jeremy Getman, Neil Kuster, Zhiyi Song, Ann Bies, & Stephanie M. Strassel. 2016 Overview of Linguistic Resources for the TAC KBP 2016 Evaluations: Methodologies and Results TAC KBP 2016 Workshop: National Institute of Standards and Technology, Gaithersburg, MD, November 14-15 Jeremy Getman, Joe Ellis, Zhiyi Song, Jennifer Tracey, & Stephanie M. Strassel. 2017 Overview of Linguistic Resources for the TAC KBP 2017 Evaluations: Methodologies and Results TAC KBP 2017 Workshop: National Institute of Standards and Technology, Gaithersburg, MD, November 13-14 8. Copyright Information (c) 2020 Trustees of the University of Pennsylvania 9. Contact Information For further information about this data release, or the TAC KBP project, contact the following project staff at LDC: Jeremy Getman, Project Manager ... ... .........Stephanie Strassel, PI -------------------------------------------------------------------------- README created by Joseph Carlough on March 28, 2018 updated by Jeremy Getman on May 11, 2018 updated by Jeremy Getman on October 17, 2018 updated by Jeremy Getman on April 9, 2019