TAC KBP Event Argument
         Comprehensive Training and Evaluation Data 2016-2017

    Authors: Joe Ellis, Jeremy Getman, Zhiyi Song, Stephanie Strassel


1. Overview

This package contains training and evaluation data produced in support of 
the 2016 TAC KBP Event Argument Linking Pilot and Evaluation tasks 
and the 2017 TAC KBP Event Argument Linking Evaluation task.

Text Analysis Conference (TAC) is a series of workshops organized by the
National Institute of Standards and Technology (NIST). TAC was developed to
encourage research in natural language processing (NLP) and related
applications by providing a large test collection, common evaluation
procedures, and a forum for researchers to share their results. Through its
various evaluations, the Knowledge Base Population (KBP) track of TAC
encourages the development of systems that can match entities mentioned in
natural texts with those appearing in a knowledge base and extract novel
information about entities from a document collection and add it to a new
or existing knowledge base.

The Event Argument Extraction and Linking task requires systems to
extract mentions of entities from unstructured text, indicate the
role they play in an event, and link the arguments appearing in the
same event to each other. Critically, as the extracted information must
be suitable as input to a knowledge base, systems construct tuples
indicating the event type, the role played by the entity in the event,
and the most canonical mention of the entity itself from the source
document. The event types and roles are drawn from an externally-specified
ontology of 31 event types, which includes financial transactions,
communication events, and attacks. For more information about Event
Argument Extraction and Linking, refer to the track home page on the NIST
TAC website, http://www.nist.gov/tac/

Source documents referenced by the files in this package are available 
separately in LDC2019T12 TAC KBP Evaluation Source Corpora 2016-2017.

The data included in this package were originally released by 
LDC to TAC KBP coordinators and participants under the following 
ecorpora catalog IDs and titles:

LDC2016E107: TAC KBP 2016 English Event Argument Linking Evaluation 
             Assessment Results V2.0
LDC2016E49:  TAC KBP 2016 English Event Argument Linking Pilot Source 
             Corpus
LDC2016E51:  TAC KBP 2016 English Event Argument Linking Pilot Queries 
             and Manual Run V1.1
LDC2016E59:  TAC KBP 2016 English Event Argument Linking Pilot Assessment 
             Results V1.1
LDC2016E60:  TAC KBP 2016 English Event Argument Linking Pilot Gold 
             Standard
LDC2016E73:  TAC KBP 2016 Eval Core Set Rich ERE Annotation with Augmented 
             Event Argument v2
LDC2016E74:  TAC KBP 2016 English Event Argument Linking Evaluation Queries 
             and Manual Run
LDC2017E55:  TAC KBP 2017 Eval Core Set Rich ERE Annotation with Augmented 
             Event Arguments

Summary of data included in this package (for more details see
./data/{2016,2017}/contents.txt):

EA Extraction and Linking Data Distribution:
+------+------------+-----------+-----------+-------------+----------+---------+---------+
|      |            |           | cross-doc |             |          |         |         |
|      |            | source    | manual    |             |          |         | event   |
| year | set        | documents | responses | assessments | entities | fillers | hoppers |
+------+------------+-----------+-----------+-------------+----------+---------+---------+
| 2016 | pilot      |      2092 |        98 |        2689 |   2,923  |  1,308  |  1,500  |
| 2016 | evaluation |        0* |       628 |        7697 |  17,681  |  4,544  |  6,799  |
| 2017 | evaluation |        0* |         0 |           0 |  17,896  |  5,995  |  8,022  |
+------+------------+-----------+-----------+-------------+----------+---------+---------+
  *source corpora for the 2016 and 2017 evals are available separately; see above


2. Contents

./docs/README.txt

  This file.

./data/{2016,2017}/contents.txt

  The data in this package are organized by the year of original release
  in order to clarify dependencies, highlight occassional differences in
  formats from one year to another, and to increase readability in
  documentation. The contents.txt file within each year's root directory
  provides a list of the contents for all subdirectories as well as
  specific details about file formats and contents.

./docs/2016/TAC_KBP_Event_Argument_Query_Development_and_Manual_Run_Guidelines_V1.2.pdf
./docs/2016/TAC_KBP_Event_Argument_Assessment_Guidelines_V1.4.pdf

  The guidelines used by annotators in developing the queries and manual
  run for the cross-document components of the 2016 Event Argument pilot 
  and 2016 Event Argument evaluation, as well as the guidelines used by 
  assessors during the cross-document assessment phases of the 2016 pilot
  and 2016 eval.

./docs/2017/EventArgumentAugmentationGuidelines.pdf

  The guidelines used by LDC annotators in creating the within-document
  gold standard ERE data produced for the 2017 Event Argument evaluation.

./dtd/2016/deft_rich_ere_augmentation.1.0.dtd
  
  DTD for all ERE xml files found in ./data/2016/

./dtd/2016/2016_event_argument_expanded_queries.dtd

  The DTD for:
  tac_kbp_2016_english_event_argument_linking_evaluation_ldc-queries_expanded.xml
  tac_kbp_2016_english_event_argument_linking_pilot_queries_expanded.xml

./dtd/2017/deft_rich_ere.1.2.dtd

  DTD for all ERE xml files found in ./data/2017/


3. Annotation Tasks

In developing data for the Event Argument evaluations, annotators 
extracted event arguments (entities or attributes playing a role in 
an event) and information about them from unstructured text. For the 
2016 and 2017 Event Argument evaluations, event arguments based on ERE 
(Entities, Relations, and Events) annotation were used as the gold
standard against which system output was scored. In 2016, a cross-
document event grouping task was also conducted, which utilized
manually developed queries.

3.1 Gold Standard Development

Entities, Relations, and Events (ERE), an annotation task developed 
by LDC for DARPA’s Deep Exploration and Filtering of Text program 
(DEFT), was first conducted in 2013 with the goal of supporting 
multiple research directions and technology evaluations. As with 
earlier related efforts like Automated Content Extraction (ACE), ERE 
exhaustively labels entities, relations and events along with their 
attributes according to specified taxonomies. As part of an effort to 
increase coordination across KBP data sets in 2016 and 2017, ERE 
annotation was performed as an upstream task in LDC's overall KBP data 
creation pipeline, providing inputs to downstream annotation tasks,
including Event Argument.

In 2016, following ERE data development, the annotations were augmented 
by running a script developed by BBN over the data and then having ERE 
annotators review the results for validity. The purpose of the 
augmentation pass was to add inferred arguments that are invalid 
following ERE guidelines and difficult for human annotators to find 
in general. In large part this translated to arguments that could be 
inferred by locational containment. For example, a Conflict.Attack 
event that had Baghdad annotated as the Place of the event might 
have Iraq added as an additional, inferred Place during the 
augmentation pass.

In 2017, as in 2016, LDC created a set of gold standard annotations 
based on event annotation in rich ERE. In order to facilitate a more 
exhaustive augmentation pass in 2017, instead of relying on automatic 
augmentation, LDC performed manual event argument augmentation to add 
arguments that were considered valid for the event argument annotation 
scheme, but not for Rich ERE. Additionally, annotators were asked to 
add any event arguments considered valid in Rich ERE, but missed during 
Rich ERE annotation. In 2016, augmentation increased the number of event 
arguments (compared to ERE without augmentation) by only 6-7%, but in 
2017 there was a 42% increase in Chinese, a 53% increase in English and 
a 61% increase in Spanish, meaning that many more valid event arguments 
were captured in the 2017 annotation under the manual augmentation 
process, as compared with the automated method utilized in 2016.

3.2 Cross-Document Queries & Manual Run

To support the 2016 cross-document component of Event Argument, 
annotators selected queries comprised of a single event argument 
pertaining to an event hopper in the gold standard Event Argument 
annotations described above. Given the anticipated difficulty of the 
task for systems, potential queries included only events for which a 
named event argument had been annotated, were sourced only from 
English documents (thus the task was English-only), and excluded the 
3 new event types added to EA in 2016 (Contact.Contact, 
Contact.Broadcast, and Transaction.Transaction). Annotators also were 
instructed to limit potential queries to event arguments that 
indicated relatively simple, low-granularity event hoppers.

Queries were also required to be productive, with the event indicated 
by the query occurring in at least 5-10 documents in the English 
portion of the source corpus. Some less-productive queries were 
included as well, however, in order to ensure that rarer or more 
difficult event types were represented in the query set.

LDC also produced an exhaustive manual run for queries, which was 
performed over the entirety of the English portion of the TAC KBP 
evaluation source corpus. A response for the manual run consisted of 
justification strings containing whatever portion or portions of a 
document was needed to prove that the event indicated by the relevant 
query occurred in the given document. A document could be returned 
more than once, if each instance was in response to a different query.

3.3 Cross-Document Assessment

For the cross-doc assessment portion of 2016 Event Argument, annotators 
reviewed all of the responses to both queries manually selected by LDC, 
as well as a set of derived queries generated from system responses.
After the cross document Event Argument evaluation was conducted, it was 
discovered that, despite the efforts taken to produce relatively simple 
queries, systems were largely unsuccessful in finding the entry points 
indicated by the queries. As such, an additional set of "derived" 
queries were produced by BBN from systems' responses in an effort to 
better measure precision given low system recall. For these queries, LDC 
did not produce a cross-doc manual run.

During assessment, assessors reviewed each response individually, and 
decided whether or not the response's justification proved that a 
document contained an instance of the event indicated by the relevant 
query. If the assessor determined that the response did indeed reference 
the same event, that response was marked CORRECT. If the response was 
determined to contain an event of the same type as the query event, but 
not the query event itself, the response was marked ET_MATCH (event type 
match). If the response was judged to contain neither the query event 
nor some other event of the same type, the response was marked WRONG.

(Note that the 2017 evaluation had no cross-document component. As such,
sections 3.2-3.3 pertain only to Event Argument data developed in 2016.)

  
4. Newswire Data

Newswire data use the following markup framework:

  <DOC id="{doc_id_string}" type="{doc_type_label}">
  <HEADLINE>
  ...
  </HEADLINE>
  <DATELINE>
  ...
  </DATELINE>
  <TEXT>
  <P>
  ...
  </P>
  ...
  </TEXT>
  </DOC>

where the HEADLINE and DATELINE tags are optional (not always
present), and the TEXT content may or may not include "<P> ... </P>"
tags (depending on whether or not the "doc_type_label" is "story").

All the newswire files are parseable as XML.


5. Multi-Post Discussion Forum Data

Multi-Post Discussion Forum files (MPDFs) are derived from English
Discussion Forum threads. They consist of a continuous run of posts
from a thread but they are only approximately 800 words in length
(excluding metadata and text within <quote> elements). When taken from
a short thread, a MPDF may comprise the entire thread. However, when
taken from longer threads, a MPDF is a truncated version of its
source, though it will always start with the preliminary post.

40 of the MPDF files contain a total of 265 characters in the range
U+0085 - U+0099; these officially fall into a category of invisible
"control" characters, but they all originated from single-byte
"special punctuation" marks (quotes, etc. from CP1252) that have been
incorrectly transcoded to utf8.

The MPDF files use the following markup framework, in which there may
also be arbitrarily deep nesting of quote elements, and other elements
may be present (e.g. "<a...>...</a>" anchor tags):

  <doc id="{doc_id_string}">
  <headline>
  ...
  </headline>
  <post ...>
  ...
  <quote ...>
  ...
  </quote>
  ...
  </post>
  ...
  </doc>
 
All the MPDF files are parseable as XML.


6. Acknowledgments

This material is based on research sponsored by Air Force Research
Laboratory and Defense Advance Research Projects Agency under
agreement number FA8750-13-2-0045. The U.S. Government is authorized
to reproduce and distribute reprints for Governmental purposes
notwithstanding any copyright notation thereon. The views and
conclusions contained herein are those of the authors and should
not be interpreted as necessarily representing the official policies
or endorsements, either expressed or implied, of Air Force Research
Laboratory and Defense Advanced Research Projects Agency or the U.S.
Government.

The authors acknowledge the following contributors to this data set:

Dave Graff (LDC)
Marjorie Freedman (BBN)
Ryan Gabbard (BBN)
Hoa Dang (NIST)
Boyan Onyshkevych (DARPA)


7. References

Joe Ellis, Jeremy Getman, Neil Kuster, Zhiyi Song, Ann Bies, & Stephanie 
M. Strassel. 2016
Overview of Linguistic Resources for the TAC KBP 2016 Evaluations: 
Methodologies and Results 
TAC KBP 2016 Workshop: National Institute of Standards and Technology, 
Gaithersburg, MD, November 14-15 

Jeremy Getman, Joe Ellis, Zhiyi Song, Jennifer Tracey, & Stephanie M.
Strassel. 2017
Overview of Linguistic Resources for the TAC KBP 2017 Evaluations: 
Methodologies and Results 
TAC KBP 2017 Workshop: National Institute of Standards and Technology, 
Gaithersburg, MD, November 13-14 


8. Copyright Information

(c) 2020 Trustees of the University of Pennsylvania


9. Contact Information

For further information about this data release, or the TAC KBP
project, contact the following project staff at LDC:

    Jeremy Getman, Project Manager       <jgetman@ldc.upenn.edu> 
    Stephanie Strassel, PI               <strassel@ldc.upenn.edu> 

--------------------------------------------------------------------------
README created by Joseph Carlough on March 28, 2018
       updated by Jeremy Getman on May 11, 2018
       updated by Jeremy Getman on October 17, 2018
       updated by Jeremy Getman on April 9, 2019