TAC KBP English Sentiment Slot Filling
          Comprehensive Training and Evaluation Data 2013-2014

          Authors: Joe Ellis, Jeremy Getman, Stephanie Strassel


1. Overview

This package contains training and evaluation data produced in support of
the TAC KBP Sentiment Slot Filling tracks in 2013 and 2014.

Text Analysis Conference (TAC) is a series of workshops organized by the
National Institute of Standards and Technology (NIST). TAC was developed to
encourage research in natural language processing (NLP) and related
applications by providing a large test collection, common evaluation
procedures, and a forum for researchers to share their results. Through its
various evaluations, the Knowledge Base Population (KBP) track of TAC
encourages the development of systems that can match entities mentioned in
natural texts with those appearing in a knowledge base and extract novel
information about entities from a document collection and add it to a new
or existing knowledge base.

Sentiment Slot Filling (SSF) is intended to supplement the data generated 
by the Entity Linking, Slot Filling, and Cold Start tracks with information 
about opinions held by KBP-valid entities (persons, organizations, and geo-
political entities) toward other KBP-valid entities. As with the regular 
Slot Filling track (SF), SSF involves mining information about entities from 
text. However, SSF seeks to evaluate the quality of detectors for scoped and 
attributed, positive and negative sentiment. For more information about 
SSF, please refer to the Sentiment Slot Filling section of NIST's 2014 TAC 
KBP website (2014 was the last year in which a Sentiment Slot Filling 
evaluation was conducted, as of the time this package was created) at 
http://www.nist.gov/tac.

This package contains all evaluation and training data developed in support 
of TAC KBP Sentiment Slot Filling in 2013 and 2014. This includes queries, 
"manual runs" (human-produced responses to the queries), and assessment results 
for both human- and system-produced responses to the queries (some of which 
were dually assessed). Note that the corresponding source document collections 
for this release are included in LDC2018T03 (TAC KBP Comprehensive English 
Source Corpora 2009-2014) and the corresponding Knowledge Base (KB) for much of 
the data - a 2008 snapshot of Wikipedia - can be obtained via LDC2014T16 (TAC 
KBP Reference Knowledge Base).

The data included in this package were originally released by LDC
to TAC KBP coordinators and participants under the following ecorpora
catalog IDs and titles:

LDC2013E78:  TAC 2013 KBP English Sentiment Slot Filling Training 
             Queries and Annotations V1.1
LDC2013E89:  TAC 2013 KBP English Sentiment Slot Filling Evaluation 
             Queries and Annotations V1.1
LDC2013E100: TAC 2013 KBP English Sentiment Slot Filling Evaluation  
             Assessment Results V1.1
LDC2014E72:  TAC 2014 KBP English Sentiment Slot Filling Evaluation  
             Queries and Annotations V1.1
LDC2014E85:  TAC 2014 KBP English Sentiment Slot Filling Evaluation  
             Assessment Results
LDC2015E47:  TAC KBP English Sentiment Slot Filling - Comprehensive  
             Training and Evaluation Data 2013-2014

Summary of data included in this package (for more details see 
/data/{2013,2014}/contents.txt):

Queries:
+------+------------+-----+-----+-----+-------+
| year |    set     | PER | ORG | GPE | total |
+------+------------+-----+-----+-----+-------+
| 2013 | evaluation |  54 |  53 |  53 |   160 |
| 2013 | training   |  55 |  54 |  54 |   163 |
| 2014 | evaluation | 134 | 133 | 133 |   400 |
+------+------------+-----+-----+-----+-------+

Manual Responses:
+------+------------+------------------+
| year |    set     | manual responses |
+------+------------+------------------+
| 2013 | evaluation |              977 |
| 2013 | training   |              986 |
| 2014 | evaluation |              594 |
+------+------------+------------------+

Assessment Data:
+------+------------+-------------+-------------+
| year |    set     | total       | total dual  |
|      |            | assessments | assessments |
+------+------------+-------------+-------------+
| 2013 | evaluation |        5160 |        1145 |
| 2014 | evaluation |        6383 |           0 |
+------+------------+-------------+-------------+


2. Contents

./docs/README.txt

  This file.

./data/{2013,2014}/contents.txt

  The data in this package are organized by the year of original release
  in order to clarify dependencies, highlight occassional differences in
  formats from one year to another, and to increase readability in
  documentation. The contents.txt file within each year's root directory
  provides a list of the contents for all subdirectories as well as
  details about file formats and contents.

./docs/all_files.md5

  Paths (relative to the root of the corpus) and md5 checksums for all files
  included in the package.

./docs/guidelines/2013/TAC_KBP_2013_Assessment_Guidelines_V1.4.pdf
./docs/guidelines/2013/TAC_KBP_2013_Sentiment_Slot_Filling_Guidelines_V1.0.pdf

  The guidelines used by annotators in developing the 2013 Sentiment Slot Filling
  queries, gold standard data and assessments contained in this corpus. 

./docs/guidelines/2014/TAC_KBP_2014_Assessment_Guidelines_V1.0.pdf
./docs/guidelines/2014/TAC_KBP_2014_Sentiment_SF_Query_Development_Guidelines_V1.0.pdf
./docs/guidelines/2014/TAC_KBP_2014_Sentiment_Slot_Filling_Guidelines_V1.2.pdf

  The guidelines used by annotators in developing the 2014 Sentiment Slot Filling
  queries, gold standard data and assessments contained in this corpus. 

./docs/task_descriptions/KBP2013_SentimentSlotFillingTaskDescription_1.1.pdf

  Task Description for the 2013 Sentiment Slot Filling evaluation track,
  written by track coordinators.

./docs/task_descriptions/KBP2014_SentimentTaskDescription_v1.1.pdf

  Task Description for the 2014 Sentiment Slot Filling evaluation track,
  written by track coordinators.

./dtd/ssf_queries_2013.dtd

  The DTD for:

  tac_kbp_2013_english_sentiment_slot_filling_evaluation_queries.xml
  tac_kbp_2013_english_sentiment_slot_filling_training_queries.xml

./dtd/ssf_queries_2014.dtd

  The DTD for:
  
  tac_kbp_2014_english_sentiment_slot_filling_evaluation_queries.xml

./tools/*

  All items in the tools directory were provided to LDC by evaluation
  track coordinators and are included here as-is, with no additional 
  modifications or testing. 
  
./tools/check_kbp_sentiment-slot-filling.pl

  Validator for 2013 sentiment slot filling submission files

./tools/check_kbp2014_sentiment-slot-filling.pl

  Validator for 2014 sentiment slot filling submission files

./tools/2013_SFScore.java

  Scorer for 2013 sentiment slot filling submission files

./tools/SFScore.java

  Scorer for 2014 sentiment slot filling submission files

./tools/KBP2013_Sentiment_SF_slot-list

  Necessary input for the 2013 scorer (see scorer for more details)

./tools/KBP2014_Sentiment_SF_slot-list

  Necessary input for the 2014 scorer (see scorer for more details)


3. Annotation tasks

The tasks conducted by LDC annotators in support of the Sentiment 
Slot Filling (SSF) track included query development, manual run 
development, and assessment of system- and human-produced responses to 
queries.

3.1 Query Development

SSF queries include a query entity and a sentiment slot that indicates 
both query polarity (whether the sentiment is positive or negative) and 
directionality (whether the query entity is the holder or the target of 
the sentiment). Entity mentions, which are the basis of all SSF queries, 
are partly selected based on their level of non-confusability. A candidate 
query entity mention is considered non-confusable if it is "canonical", 
meaning that it is not an alias, and includes more than just a first or 
last name. Entity mentions are also rejected if they include objectionable 
content. 

Productivity is also heavily weighted in the selection of queries; 
candidates must usually contain at least two responses (a.k.a "slot fillers") 
in the source corpus. However, SSF query developers could also focus on 
selecting queries capable of generating edge-case or interesting fillers, and 
often both. For example, consider the two following sentences:

               "I think Michael Vick should have been 
               executed for that", said Carlson. 

               Carlson said he hated Michael Vick.

Correctly extracting a response from the first of the two statements is 
more challenging due to the inference needed to derive the sentiment from 
the stated desired action.

Following initial query development passes, a quality control pass was 
conducted to flag any fillers that did not have adequate justification 
in the source document, or that might be at variance with the guidelines 
in any way. These flagged fillers were then adjudicated by senior 
annotators who updated, removed, or replaced them as appropriate.

3.2 Manual Run Development

In developing the "manual runs", or human-produced sets of responses for 
SSF queries, annotators have up to two hours per query to search the corpus 
and locate all valid fillers. In 2013, responses could be drawn from any 
document in the KB source corpus. However, in 2014, answers were only drawn 
from the same source documents as the queries. 

Justification - the minimum extents of provenance supporting the validity 
of a slot filler - is an adjunct annotation employed to pinpoint the sources 
of assertions and, thereby, reduce the effort required for assessment. Valid 
justification strings should clearly identify all three elements of a relation 
(i.e. the subject entity, the predicate slot, and the object filler), and the 
relation between them, with minimal extraneous text. In 2013, justification 
allowed for up to two discontiguous strings, each selected from separate 
documents. In 2014, justification was again altered to allow for up to four 
justification strings. This facilitated a greater potential for inferred 
relations that would be difficult to justify with only one or two text
extents.

Following the initial round of annotation for manual runs, a quality 
control pass was conducted to flag any fillers that did not have adequate 
justification in the source document, or that might be at variance with 
the guidelines in any way. These flagged fillers were then adjudicated by 
senior annotators who updated or removed them as appropriate.

3.3 Assessment

In assessment, annotators judge and coreference slot filler responses 
returned for the query set from both the human manual run and 
from systems. Fillers are marked correct if they are found to be both 
compatible with the slot descriptions and supported in the provided 
justification string(s) and/or surrounding content. Fillers are 
assessed as wrong if they do not meet both of the conditions for 
correctness. Additionally, fillers are assessed as inexact if 
overly insufficient or extraneous text was returned for an otherwise 
correct response. 

Justification is assessed as correct if it succinctly and completely 
supports the relation and wrong if it does not support the relation at 
all (or if the corresponding filler is marked wrong). Justification can 
also be assessed as inexact-short (if part but not all of the information 
necessary to support the relation is provided) and inexact-long (if it 
contains all information necessary to support the relation but also a 
great deal of extraneous text). Starting in 2014, responses with 
justification comprising more than 600 characters in total were 
automatically marked as ignored and not reviewed during assessment. 

In 2013, dual assessment was performed on the responses to 40 of the SSF 
evaluation queries. For each of the four Sentiment Slot Filling slots, 
10 queries were randomly selected for dual assessment of responses.

After first passes of assessment were completed, quality control was 
performed on the data by senior annotators. Performing quality control 
ensured that the extent of each annotated filler and justification were 
correct and that entities assessed as correct were coreferenced in the 
appropriate equivalence class.  


4. Using the Data

As mentioned in the introduction, note that the corresponding source 
document collections for this release are included in LDC2018T03 (TAC KBP 
Comprehensive English Source Corpora 2009-2014). Also, the corresponding 
Knowledge Base (KB) for much of the data - a 2008 snapshot of Wikipedia 
- can be obtained via LDC2014T16 (TAC KBP Reference Knowledge Base).

4.1 Offset calculation

Text normalization of queries consisting of a 1-for-1 substitution of 
newline (0x0A) and tab (0x09) characters with space (0x20) characters 
was performed on the document text input to the response field.

The values of the beg and end XML elements in the queries.xml files
indicate character offsets to identify text extents in the source.  Offset
counting starts from the initial character (character 0) of the source
document and includes newlines and all markup characters - that is, the
offsets are based on treating the source document file as "raw text", with all
its markup included.

4.2 Proper ingesting of XML queries

While the character offsets are calculated based on treating the source
document as "raw text", the "name" strings being referenced by the queries
sometimes contain XML metacharacters, and these had to be "re-escaped" for
proper inclusion in the queries.xml file.  For example, an actual name like
"AT&T" may show up a source document file as "AT&amp;T" (because the source
document was originally formatted as XML data).  But since the source doc is
being treated here as raw text, this name string is treated in queries.xml as
having 7 characters (i.e., the character offsets, when provided, will point to
a string of length 7).

However, the "name" element itself, as presented in the queries.xml file, will
be even longer - "AT&amp;amp;T" - because the queries.xml file is intended to
be handled by an XML parser, which will return "AT&amp;T" when this "name"
element is extracted.  Using the queries.xml data without XML parsing would
yield a mismatch between the "name" value and the corresponding string in the
source data.


5. Acknowledgments

This material is based on research sponsored by Air Force Research
Laboratory and Defense Advance Research Projects Agency under
agreement number FA8750-13-2-0045. The U.S. Government is authoized
to reproduce and distribute reprints for Governmental purposes
notwithstanding any copyright notation thereon. The views and
conclusions contained herein are those of the authors and should
not be interpreted as necessarily representing the official policies
or endorsements, either expressed or implied, of Air Force Research
Laboratory and Defense Advanced Research Projects Agency or the U.S.
Government.

The authors acknowledge the following contributors to this data set:
Dana Fore (LDC)
Dave Graff (LDC)
Margaret Mitchell (Microsoft)
Hoa Dang (NIST)
Claire Cardie (Cornell)
Boyan Onyshkevych (DARPA)


6. References

Joe Ellis, Jeremy Getman, Stephanie M. Strassel. 2014
Overview of Linguistic Resources for the TAC KBP 2014 Evaluations: Planning,
Execution, and Results 
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-2014-overview.pdf
TAC KBP 2014 Workshop: National Institute of Standards and Technology, 
Gaithersburg, Maryland, November 17-18

Joe Ellis, Jeremy Getman, Justin Mott, Xuansong Li, Kira Griffitt, 
Stephanie M. Strassel, Jonathan Wright. 2013
Linguistic Resources for 2013 Knowledge Base Population Evaluations 
www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/tackbp-workshop2013-linguistic-resources-kbp-eval.pdf
TAC KBP 2013 Workshop: National Institute of Standards and Technology,
Gaithersburg, MD, November 18-19


7. Copyright Information

(c) 2021 Trustees of the University of Pennsylvania


8. Contact Information

For further information about this data release, contact the 
following project staff at LDC:

    Jeremy Getman, Lead Annotator        <jgetman@ldc.upenn.edu>
    Stephanie Strassel, PI               <strassel@ldc.upenn.edu>

-----------------------------------------------------------------------------
README created by Dana Fore January 7, 2016
       updated by Dana Fore January 29, 2016
       updated by Dana Fore April 4, 2016
       updated by Neil Kuster September 14, 2016
       updated by Joe Ellis on October 5, 2016
       updated by Jeremy Getman on October 5, 2016
       updated by Joe Ellis on October 6, 2016
       updated by Joe Ellis on February 17, 2017