TAC KBP English Event Argument
               Training and Evaluation Data 2014-2015

         Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel


1. Overview

This package contains training and evaluation data produced in support of 
the 2014 TAC KBP English Event Argument Extraction Pilot and Evaluation 
tasks and the 2015 English Event Argument Extraction and Linking Training 
and Evaluation tasks.

Text Analysis Conference (TAC) is a series of workshops organized by the
National Institute of Standards and Technology (NIST). TAC was developed to
encourage research in natural language processing (NLP) and related
applications by providing a large test collection, common evaluation
procedures, and a forum for researchers to share their results. Through its
various evaluations, the Knowledge Base Population (KBP) track of TAC
encourages the development of systems that can match entities mentioned in
natural texts with those appearing in a knowledge base and extract novel
information about entities from a document collection and add it to a new
or existing knowledge base.

The Event Argument Extraction and Linking task requires systems to extract 
event arguments (entities or attributes playing a role in an event) from 
unstructured text, indicate the role they play in an event, and link the 
arguments appearing in the same event to each other. Critically, as the 
extracted information must be suitable as input to a knowledge base, 
systems construct tuples indicating the event type, the role played by the 
entity in the event, and the most canonical mention of the entity itself 
from the source document. The event types and roles are drawn from an 
externally-specified ontology of 31 event types, which includes financial 
transactions, communication events, and attacks. For more information 
about Event Argument Extraction and Linking, refer to the track home page 
on the NIST TAC website, http://www.nist.gov/tac/.

The data included in this package were originally released by 
LDC to TAC KBP coordinators and participants under the following ecorpora
catalog IDs and titles:

LDC2014E20: TAC 2014 KBP Event Argument Extraction Pilot Source Corpus 
            V1.1
LDC2014E40: TAC 2014 KBP Event Argument Extraction Pilot Assessment 
            Results V1.1
LDC2014E74: TAC 2014 KBP English Event Argument Extraction Evaluation 
            Annotations V1.1
LDC2014E88: TAC 2014 KBP English Event Argument Extraction Evaluation 
            Assessment Results V2.0
LDC2014R43: TAC 2014 KBP English Event Argument Extraction Evaluation 
            Source Corpus V1.1
LDC2015E22: TAC KBP English Event Argument Extraction Comprehensive 
            Pilot and Evaluation Data 2014
LDC2015E41: TAC KBP 2015 English Event Argument Linking Training Data
LDC2015E79: TAC KBP 2015 English Event Argument Linking Evaluation 
            Source Corpus
LDC2015E92: TAC KBP 2015 English Event Argument Linking Evaluation 
            Manual Run
LDC2015E101: TAC KBP 2015 English Event Argument Linking Evaluation 
             Assessment Results V2.0
LDC2016E37: TAC KBP English Event Argument Comprehensive Training and 
            Evaluation Data 2014-2015

Summary of data included in this package (for more details see
./data/{2014,2015}/contents.txt):

EA Extraction and Linking Data Distribution:
+------+------------+-----------+-----------+-------------+---------+
|      |            | source    | manual    |             | event   |
| year | set        | documents | responses | assessments | hoppers |
+------+------------+-----------+-----------+-------------+---------+
| 2014 | pilot      |        60 |         0 |      32,054 |     n/a |
| 2014 | evaluation |       528 |     5,947 |      57,599 |     n/a |
| 2015 | training   |       55* |         0 |           0 |     599 |
| 2015 | evaluation |       500 |     5,207 |      45,391 |   1,608 |
+------+------------+-----------+-----------+-------------+---------+
    *NOTE: the 2015 training source documents are a subset 
           of the 2014 evaluation source corpus

2. Contents

./README.txt

  This file.
  
./data/{2014,2015}/contents.txt

  The data in this package are organized by the year of original release
  in order to clarify dependencies, highlight occassional differences in 
  formats from one year to another, and to increase readability in 
  documentation. The contents.txt file within each year's root directory 
  provides a list of the contents for all subdirectories as well as 
  details about file formats and contents.
 
./docs/all_files.md5

  Paths (relative to the root of the corpus) and md5 checksums for all files
  in the package.
  
./docs/guidelines/*
  
  The guidelines used by annotators for the 2014-2015 Event Argument
  manual runs, including the 2015 linking tasks, as well as the  
  assessment tasks contained in this corpus.
  
./docs/task_descriptions/*

  Task Descriptions for the 2014 Event Argument Extraction and 2015 Event 
  Argument Extraction and Linking evaluation tracks, written by track
  coordinators.
  
./tools/{2014,2015}/*

  Scorers for 2014 and 2015 EA submissions, as provided to LDC by
  evaluation track coordinators, with no further testing.


3. Annotation Tasks

In developing data for the 2014 Event Argument Extraction evaluation 
track (EAE), annotators extracted event arguments (entities or attributes 
playing a role in an event) and information about them from unstructured 
text. Event argument tuples indicate the event type, the role played by the 
entity in the event, and the most canonical mention of the entity itself 
from the source document. In Event Argument Linking (EAL), an extension of 
EAE made in 2015, the event argument tuples are further linked with other 
tuples into Event Hoppers (a relaxed form of identity coreference for events), 
indicating that the tuples played a role in the same event or events. In both 
2014 and 2015, data development by LDC for the Event Argument task consisted 
of three separate processes - source document selection, manual run development, 
and assessment.
  
3.1 Source Document Selection

Documents serve as queries in EAE/EAL and so the first annotation 
task is to perform targeted searches over sets of unreleased documents 
in two genres, newswire and discussion forum threads. Documents are 
valid if they contain at least one "Actual" mention of one of the specified 
event types along with appropriate arguments for the event. "Actual" 
events, as defined in EAL, include those that happened in the past or 
those that are ongoing in the present. Data scouts search primarily for 
documents with a variety of event types, though documents providing 
mentions of generally less common event types are also selected. For 
each document reviewed, a tally of the number of unique event mentions 
for each event type is created in order to ensure that all of the 
targeted event types are reasonably well-represented. While performing 
document reviews, annotators also search for and flag documents with 
undesirable qualities (e.g., discussion forum threads with more than a 
small amount of newswire quotation) in order to maximize informal content.

3.2 Manual Run Development

In the manual run for the 2014 EAE evaluation, an annotator had a 
maximum of thirty minutes per document to annotate one mention of 
all valid, unique event arguments. In 2015, LDC performed the manual 
run over a 300-document subset of the 500-document source corpus used 
by systems in the evaluation. This 300-document subset was selected 
using the event tallies produced during document selection to maximize 
and balance event coverage. Priority was given to keeping the event 
types mixed and ensuring that each event type was still represented 
at least 10 times per genre across the 300-document sub-corpus used 
for the manual run. For each document in the 2015 EAL evaluation, 
annotators had a maximum of sixty minutes to both annotate one mention 
of each unique event argument and to cluster all arguments into event 
hoppers. 

Following the initial rounds of annotation, quality control (QC) passes 
are conducted over the manual run data to flag any event arguments or 
linking decisions that do not have adequate justification in the source 
document, or that might be at variance with the current guidelines. 
These flagged annotations were then adjudicated by senior annotators.

3.3 Assessment

For the assessment stage, the 2014 EAE task consisted of entity 
coreference and response assessment, which for 2015 EAL was extended 
to include a subsequent argument linking subtask. Each assessor 
received training which refined their assessments to match the level 
of gold standard annotations produced by senior annotators. The first 
step in each EAL assessment kit was to perform entity coreference on 
all responses returned by systems and LDC for a given document. This 
included correct responses, inexact responses and wrong responses. 
Following the completion of entity coreference, assessors moved on 
to response assessment.
  
In response assessment, assessors made six judgments on each response 
generated. First, the four parts of a response - event type, argument 
role (the role that a response played in its matched event), base filler 
(the mention of the argument included in the justification) and 
canonical argument string (the 'most complete' mention of the argument 
from the document) - were all marked as 'correct' if they were found to 
be supported in the sources and in-line with the definition of the 
relevant event and argument role.  Responses were considered 'wrong' if 
they did not meet both of the conditions for correctness and 'inexact' 
if insufficient justification was provided or extraneous text was 
selected for an otherwise correct response.  Additionally, each response 
was given a 'realis' label, which indicated the modality of the assessed
event argument ('Actual' if the event clearly occurred in the past, 
'Generic' if the event was generic in nature - e.g. "I go to the store on 
Sundays", and 'Other' if the event could not neatly be described as one 
of the other two categories). Lastly, assessors also marked the canonical 
argument strings as either 'name' or 'nominal' to indicate the type of 
mention.
   
After response assessment was completed, QC was performed on the data. 
Senior annotators reviewed the work of assessors and made corrections to 
assessment kits and, for each correction that was made, the reviewer 
followed up with the original assessor to clarify the correction. For 
certain classes of potential errors, BBN produced automated reports for 
senior annotators to review while performing QC. The following classes
of potential errors were reported: possible inconsistencies in the 
handling of inexact namestrings in the coreference and assessment stages; 
pronouns or nominal phrases being assessed as 'correct' when a name for the 
same entity was potentially available; and cases of entity mention overlap
(e.g. "Central Park" and "a fountain in Central Park") where the difficulty
in granularity may indicate a problem with coreference.

In 2015, following the completion of QC for a given document, the senior 
annotator who had performed the QC for that document then performed the 
document's argument linking step as well, which was comprised of deciding 
how correct and inexact responses should be grouped together in event 
hoppers. If two or more event arguments were judged by the annotator to
be arguments of the same event, then the arguments were placed into the 
same event hopper, a less strict concept than true event coreference,
wherein considerations such as realis aren't necessarily a factor. For
instance, the arguments associated with a future event mention and those
associated with a past event mention might be placed in the same event
hopper, if the event mentions occurred in forum posts written at different
times (one prior to the event in question and one following that same event)
and the annotator reviewing the arguments judged the events referenced to be
the same, despite the future event arguments having 'Other' realis and the
past event arguments having 'Actual' realis.

  
4. Newswire Data

Newswire data use the following markup framework:

  <DOC id="{doc_id_string}" type="{doc_type_label}">
  <HEADLINE>
  ...
  </HEADLINE>
  <DATELINE>
  ...
  </DATELINE>
  <TEXT>
  <P>
  ...
  </P>
  ...
  </TEXT>
  </DOC>

where the HEADLINE and DATELINE tags are optional (not always
present), and the TEXT content may or may not include "<P> ... </P>"
tags (depending on whether or not the "doc_type_label" is "story").

All the newswire files are parseable as XML.


5. Discussion Forum Data

Discussion Forum files use the following markup framework, in which 
there may also be arbitrarily deep nesting of quote elements, and 
other elements may be present (e.g. "<a...>...</a>" anchor tags):

  <doc id="{doc_id_string}">
  <headline>
  ...
  </headline>
  <post ...>
  ...
  <quote ...>
  ...
  </quote>
  ...
  </post>
  ...
  </doc>

Additionally, each <doc> unit contains at least five post elements.

All the discussion forum files are parseable as XML.


6. Multi-Post Discussion Forum Data

Multi-Post Discussion Forum files (MPDFs) are derived from English
Discussion Forum threads. They consist of a continuous run of posts
from a thread but they are only approximately 800 words in length
(excluding metadata and text within <quote> elements). When taken from
a short thread, a MPDF may comprise the entire thread. However, when
taken from longer threads, a MPDF is a truncated version of its
source, though it will always start with the preliminary post.

40 of the MPDF files contain a total of 265 characters in the range
U+0085 - U+0099; these officially fall into a category of invisible
"control" characters, but they all originated from single-byte
"special punctuation" marks (quotes, etc. from CP1252) that have been
incorrectly transcoded to utf8.

The MPDF files use the following markup framework, in which there may
also be arbitrarily deep nesting of quote elements, and other elements
may be present (e.g. "<a...>...</a>" anchor tags):

  <doc id="{doc_id_string}">
  <headline>
  ...
  </headline>
  <post ...>
  ...
  <quote ...>
  ...
  </quote>
  ...
  </post>
  ...
  </doc>
 
All the MPDF files are parseable as XML.


7. Acknowledgments

This material is based on research sponsored by Air Force Research
Laboratory and Defense Advance Research Projects Agency under
agreement number FA8750-13-2-0045. The U.S. Government is authoized
to reproduce and distribute reprints for Governmental purposes
notwithstanding any copyright notation thereon. The views and
conclusions contained herein are those of the authors and should
not be interpreted as necessarily representing the official policies
or endorsements, either expressed or implied, of Air Force Research
Laboratory and Defense Advanced Research Projects Agency or the U.S.
Government.

The authors acknowledge the following contributors to this data set:

Dave Graff (LDC)
Marjorie Freedman (BBN)
Ryan Gabbard (BBN)
Hoa Dang (NIST)
Boyan Onyshkevych (DARPA)


8. References

Joe Ellis, Jeremy Getman, Dana Fore, Neil Kuster, Zhiyi Song, Ann 
Bies, Stephanie Strassel. 2015
Overview of Linguistic Resources for the TAC KBP 2015 Evaluations: 
Methodologies and Results
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp2015_overview.pdf
TAC KBP 2015 Workshop: National Institute of Standards and Technology, 
Gaithersburg, Maryland, November 16-17

Joe Ellis, Jeremy Getman, Stephanie M. Strassel. 2014
Overview of Linguistic Resources for the TAC KBP 2014 Evaluations: Planning,
Execution, and Results 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-2014-overview.pdf
TAC KBP 2014 Workshop: National Institute of Standards and Technology, 
Gaithersburg, Maryland, November 17-18


9. Copyright Information

(c) 2020 Trustees of the University of Pennsylvania


10. Contact Information

For further information about this data release, or the TAC 2014 KBP
project, contact the following project staff at LDC:

    Jeremy Getman, Lead Annotator        <jgetman@ldc.upenn.edu> 
    Stephanie Strassel, PI               <strassel@ldc.upenn.edu> 

--------------------------------------------------------------------------
README created by Neil Kuster on February 17, 2016
       updated by Neil Kuster on March 18, 2016
       updated by Jeremy Getman on March 18, 2016
       updated by Neil Kuster on April 21, 2016
       updated by Jeremy Getman on April 22, 2016
       updated by Joe Ellis on September 8, 2016
       updated by Jeremy Getman on September 14, 2016
       updated by Joe Ellis on September 16, 2016
       updated by Jeremy Getman on September 19, 2016