Corpus Title: AIDA Scenario 1 Evaluation Topic Source Data, Annotation, Assessment
LDC Catalog-ID:  LDC2025T13

Authors: Jennifer Tracey, Stephanie Strassel, Jeremy Getman, Ann Bies,
         Kira Griffitt, David Graff, Chris Caruso, Joshua Parry

1.0 Introduction

This corpus was developed by the Linguistic Data Consortium for the DARPA
AIDA Program and contains a multi-media collection of 10,522 documents used 
in the AIDA Month 9 pilot evaluation and the AIDA Final Phase 1 evaluation,
annotations for 386 of those documents, and results of assessment of 77,965
responses in 1,525 of those documents.

The AIDA (Active Interpretation of Disparate Alternatives) Program was
designed to support development of technology that can assist in
cultivating and maintaining understanding of events when there are
conflicting accounts of what happened (e.g. who did what to whom and/or
where and when events occurred). AIDA systems must extract entities,
events, and relations from individual multimedia documents, aggregate that
information across documents and languages, and produce multiple knowledge
graph hypotheses that characterize the conflicting accounts that are
present in the data.

Each phase of the AIDA program focused on a different scenario, or broad
topic area. The scenario for Phase 1 was political relations between
Russia and Ukraine in the 2010s. This scenario was used for both the AIDA
Month 9 pilot evaluation and the AIDA Final Phase 1 evaluation.
In addition, each scenario had a set of specific subtopics within the
scenario that were designated as either "practice topics" (released for
use in system development) or "evaluation topics" (reserved for use in the
AIDA program evaluations for each phase).

The annotations and assessments contained in this release include coverage
of the following three evaluation topics ('P' IDs used in Month 9 pilot
annotations, 'E' IDs used in Scenario 1 evaluation annotations):

P101/E101 - Suspicious Deaths and Murders in Ukraine (January-April 2015)
P102/E102 - Odessa Tragedy (May 2, 2014)
P103/E103 - Siege of Sloviansk and Battle of Kramatorsk (April-July 2014)

2.0 Directory Structure

The directory structure and contents of the package are summarized below --
paths shown are relative to the base (root) directory of the package:

  ./data/source/ -- contains zip files subdivided by data type (see below)
  ./data/annotation/ -- contains subdirectories of annotation organized by 
                        evaluation partition and subdivided by topic
  ./data/assessment/ -- contains subdirectories of assessment organized by
                        evaluation partition and subdivided by response type
  ./data/video_shot_boundaries/representative_frames -- contains subdirectories 
                                                        for each video, with any 
                                                        keyframe PNGs referenced 
                                                        in the annotation 
                                                        and assessment tables                    
  ./docs/ -- contains documentation about the source data, annotation, and assessment
  ./tools/ -- contains software for LTF data manipulation and twitter processing

The "source" subdirectory of the "data" directory has a separate subdirectory
for each of the following data types, and each directory contains one or more
zip archives with data files of the given type; the list shows the archive-internal
directory and file-extension strings used for the data files of each type:

    bmp/*.bmp.zip -- contains "bmp/*.bmp.ldcc" files (image data)
    gif/*.gif.zip -- contains "gif/*.gif.ldcc" files (image data)
    jpg/*.jpg.zip -- contains "jpg/*.jpg.ldcc" files (image data)
    mp3/*.mp3.zip -- contains "mp3/*.mp3.ldcc" files (audio data)
    mp4/*.mp4.zip -- contains "mp4/*.mp4.ldcc" files (typically video)
    png/*.png.zip -- contains "png/*.png.ldcc" files (image data)
    svg/*.svg.zip -- contains "svg/*.svg.ldcc" files (image data)
    ltf/*.ltf.zip -- contains "ltf/*.ltf.xml" (segmented/tokenized text data)
    psm/*.psm.zip -- contains "psm/*.psm.xml" files (companion to ltf.xml)

Data types in the first group consist of original source materials presented
in "ldcc wrapper" file format (see section 4.2 below). The latter group (ltf
and psm) are created by LDC from source HTML data, by way of an intermediate
XML reduction of the original HTML content for "root" web pages (see section
4.1 for a description of the process, and section 5 for details on the LTF and
PSM file formats).

The 6-character file-ID of the zip archive matches the first 6 characters of
the 9-character file-IDs of the data files it contains. For example:

  zip archive file ./data/source/png/HC0000.png.zip contains:

    png/HC00000FM.png.ldcc
    png/HC00000FN.png.ldcc
    ...
    png/HC00009L7.png.ldcc
    png/HC00009L8.png.ldcc

(The "ldcc" file format is explained in more detail in section 4.2 below.)
Note that the number of data files per zip archive varies with the largest zip
in this package containing over 4,100 files.

The "video_shot_boundaries" directory contains a "representative_frames"
subdirectory which contains a directory of .png images corresponding to
each detected shot referenced in annotation or assessment tables. These
directories are named using the 9-character file-ID of the video from which
the included frames were extracted.

3.0 Content Summary

3.1 Source Data

The source data was manually scouted by annotators searching for
relevant material which was then collected (harvested) from various web
sources.

In the mini-table below, "#RtPgs" refers to the number of root HTML pages 
that were scouted and harvested; the other columns indicate the total number 
of data files of the various types extracted from those root pages.

#RtPgs	#Imgs	#Vids	#Auds
10522	28572	990	7

Note: The number of root pages in the table above includes Twitter data,
which is not present in the data directories and must be downloaded from
Twitter by the user. Assets associated with tweets are marked as "diy"
in the status_in_corpus field of the parent_children.tab file. The number
of image, video, and audio files in the table above does not include
Twitter data.

3.2 Annotation Data

The table below provides a summary of the number of HTML pages ("root documents")
annotated for each topic and language for the Month 9 pilot evaluation.

Topic	Lang	Docs
P101	ENG	24
P101	RUS	71
P101	UKR	17
P102	ENG	41
P102	RUS	63
P102	UKR	17
P103	ENG	42
P103	RUS	91
P103	UKR	40

The table below provides a summary of the number of root documents
annotated for each topic and language for the Final Phase 1 evaluation.

Topic	Lang	Docs
E101	ENG	13
E101	RUS	31
E101	UKR	16
E102	ENG	22
E102	RUS	33
E102	UKR	15
E103	ENG	26
E103	RUS	38
E103	UKR	27

3.3 Assessment Data

The table below provides a summary of the number of root documents
for each language from which assessment responses were sourced during
the Month 9 pilot evaluation. In some cases, the language of the document
was determined automatically, and in other cases, the language of the document 
was set by the language of the annotator who scouted the document.

Lang    Docs
ENG     42
RUS     91
UKR     40

The table below provides a summary of the number of root documents
for each language from which assessment responses were sourced during
the Final Phase 1 evaluation. In some cases, the language of the document
was determined automatically, and in other cases, the language of the
document matches the language of the annotator who scouted the document.

Lang    Docs
ENG     430
RUS     407
UKR     636

4.0 Data Processing and Character Normalization

Most of the content was harvested from various web sources. 
Source documents were collected in two steps. First, a manual scouting process 
was used to identify specific HTML pages with relevant content for annotation. 
Then, an automated process was used to harvest additional HTML pages from those 
same web sources. Some content may have been harvested manually, or by means of 
ad-hoc scripted methods for sources with unusual attributes.

4.1 Treatment of Original HTML Text Content

All harvested HTML content was initially converted from its original form into
a relatively uniform XML format; this stage of conversion eliminated
irrelevant content (menus, ads, headers, footers, etc.) and placed the
content of interest into a simplified, consistent markup structure.

The "homogenized" XML format then served as input for the creation of
a reference "raw source data" (rsd) plain text form of the web page
content; at this stage, the text was also conditioned to normalize
white-space characters and to apply transliteration and/or other
character normalization, as appropriate to the given language.

This processing created the ltf.xml and psm.xml files for each harvested
"root" web page; these file formats are described in more detail in section 5
below.

4.2 Treatment of Non-HTML Data Types: "ldcc" File Format

To the fullest extent possible, all discrete resources referenced by a given
"root" HTML page (style sheets, javascript, images, media files, etc.) are
stored as separate files of the given data type, and assigned separate
9-character file-IDs (the same form of ID as is used for the "root" HTML
page).

In order to present these attached resources in a stable and consistent way,
the LDC has developed a "wrapper" or "container" file format, which presents
the original data as-is, together with a specialized header block prepended to
the data. The header block provides metadata about the file contents,
including the MD5 checksum (for self-validation), the data type and byte count,
url, and citations of source-ID and parent (HTML) file-ID.

The LDCC header block always begins with a 16-byte ASCII signature, as shown
between double-quotes on the following line (where "\n" represents the ASCII
"newline" character 0x0A):

"LDCc   \n1024   \n"

Note that the "1024" on the second line of the signature represents the exact
byte count of the LDCC header block. (If/when this header design needs to
accommodate larger quantities of metadata, the header byte count can be
expanded as needed in increments of 1024 bytes. Such expansion does not arise
in the present release.)

Immediately after the 16-byte signature, a YAML string presents a data
structure comprising the file-specific header content, expressed as a set of
"key: value" pairings in UTF-8 encoding.

The YAML string is padded at the end with space characters, such that when the
following 8-byte string is appended, the full header block size is exactly
1024 bytes (or whatever size is stated in the initial signature):

"endLDCc\n"

In order to process the content of an LDCC header:

 - read the initial block of 1024 bytes from the *.ldcc data file
 - check that it begins with "LDCc   \n1024   \n" and ends with "endLDCc\n"
 - strip off those 16- and 8-byte portions
 - pass the remainder of the block to a YAML parser.

In order to access the original content of the data file, simply skip or
remove the initial 1024 bytes.

5.0 Overview of XML Data Structures

5.1 PSM.xml -- Primary Source Markup Data

The "homogenized" XML format described above preserves the minimum set of tags
needed to represent the structure of the relevant text as seen by the human
web-page reader. When the text content of the XML file is extracted to create
the "rsd" format (which contains no markup at all), the markup structure is
preserved in a separate "primary source markup" (psm.xml) file, which
enumerates the structural tags in a uniform way, and indicates, by means of
character offsets into the rsd.txt file, the spans of text contained within
each structural markup element.

For example, in a discussion-forum or web-log page, there would be a division
of content into the discrete "posts" that make up the given thread, along with
"quote" regions and paragraph breaks within each post. After the HTML has
been reduced to uniform XML, and the tags and text of the latter format have
been separated, information about each structural tag is kept in a psm.xml
file, preserving the type of each relevant structural element, along with its
essential attributes ("post_author", "date_time", etc.), and the character
offsets of the text span comprising its content in the corresponding rsd.txt
file.

5.2 LTF.xml -- Logical Text Format Data

The "ltf.xml" data format is derived from rsd.txt, and contains a fully
segmented and tokenized version of the text content for a given web page.
Segments (sentences) and the tokens (words) are marked off by XML tags (SEG
and TOKEN), with "id" attributes (which are only unique within a given XML
file) and character offset attributes relative to the corresponding rsd.txt
file; TOKEN tags have additional attributes to describe the nature of the
given word token.

The segmentation is intended to partition each text file at sentence
boundaries, to the extent that these boundaries are marked explicitly by
suitable punctuation in the original source data. To the extent that sentence
boundaries cannot be accurately detected (due to variability or ambiguity in
the source data), the segmentation process will tend to err more often on the
side of missing actual sentence boundaries, and less often on the
side of asserting false sentence breaks.

The tokenization is intended to separate punctuation content from word
content, and to segregate special categories of "words" that play particular
roles in web-based text (e.g. URLs, email addresses and hashtags). To the
extent that word boundaries are not explicitly marked in the source text, the
LTF tokenization is intended to divide the raw-text character stream into
units that correspond to "words" in the linguistic sense (i.e. basic units of
lexical meaning).

NB: Due to Twitter's terms of service, no Twitter content is provided in ltf.
Users must download the tweets listed in the twitter_info.tab file in the
docs/annotation/ directory. The twitter-processing tool provided in the tools/
directory can be used to ensure that the version of the tweet downloaded by
users matches the version downloaded by LDC.

6.0 Annotations

6.1 Annotation Overview

AIDA included three primary technology goals: extraction of information elements
from multilingual, multimedia documents; aggregation of extracted information
elements into a common semantic representation; and generation of multiple
hypotheses about that information. Manual annotation of AIDA data supported
development and evaluation of each component of AIDA systems. First,
within-document annotation labeled scenario-relevant entities, relations, and
events in the AIDA corpus. Annotators then conducted coreference annotation
across documents, languages, and modalities by linking individual information
elements to a shared knowledge base. Finally, annotators indicated the relationship
between the set of labeled events/relations and various hypotheses about the
scenario, for instance by indicating whether a given event supported the veracity
of a particular scenario hypothesis. Each annotation task is described in more
detail below.

6.1.1 Within-Document Annotation

Within-document annotation consisted of labeling mentions of entities, relations,
and events (including argument structure for events and relations) within
individual multimedia documents for each AIDA language. For each event or relation
subject to annotation, AIDA annotators made a number of decisions. First, each
event or relation instance and each associated entity argument were anchored in
document-level provenance. Annotators provided a brief text description (a word or
short phrase) for each event, relation, or argument and assigned it a type from the
annotation tagset (a set of labels for different types and subtypes of entities, 
relations, and events). Arguments were also labeled for the role they play in the event 
or relation. Annotators then specified any attributes associated with the event,
relation, or argument (e.g. two attributes used in both the pilot and phase 1
annotation were “not”, indicating negation, and “hedged”, indicating uncertainty).
Finally, relations and events were labeled for temporal information. Dates are
characterized as starting or ending on, before, or after a particular date, and the
date is expressed in year-month-day format, with partially populated dates possible.

For the AIDA pilot annotation and Phase 1 annotation contained in this corpus,
annotation was limited to events and relations relevant (i.e. salient) to a
predetermined set of scenario topics. First, documents were designated as being
generally relevant to a particular topic in the scenario. Next, annotators labeled
all relations and events within the document associated with the topic, along with
the entity mentions acting as arguments for those relations/events (i.e. slots).
Events, relations, and entity arguments in the document that were not related
to the specified topic were not labeled.

6.1.2 Cross-Document Annotation

Cross-document coreference was necessary to support a whole-corpus understanding
of events, relations, and their entities, enabling the generation of corpus-wide
hypotheses. Procedurally, coreference was achieved by manually linking individual
entity and event instances to a knowledge base (KB), comprising a set of informational 
entries drawn from GeoNames, the CIA World Leaders List, and the CIA World
Factbook, supplemented with manually-created entries developed specifically 
for AIDA data. For the pilot annotation effort, we seeded KBs for each topic with events, 
relations, and entities known to be relevant to the topic and potentially present in the 
data. Annotators then manually linked individual event, relation, and entity instances 
from the documents to the KB and flagged any instances that could not be linked, in which 
case new KB entries were created. The AIDA Phase 1 evaluation design required a 
program-wide reference entity knowledge base, so we constructed a new reference KB 
consisting of entities known to be relevant to scenario topics along with a large number 
of other entities, drawn from existing KBs, whose relevance to specific AIDA topics was
unknown. (The AIDA Scenario 1 and 2 Reference Knowledge Base is available as
LDC2023T10.) Phase 1 coreference annotation for entities then consisted of manually
linking entity instances to the reference KB. When no match is present in the KB,
the entity is marked as NIL; once all KB linking is complete, all NILs are reviewed
and clustered, such that multiple mentions of the same NIL entity are assigned
the same unique NIL ID. Events were also manually clustered and assigned unique
NIL IDs. Finally, relations were automatically clustered and assigned unique NIL IDs
based on the results of manual entity clustering: relations with the same type,
and whose arguments have the same argument role and contain the same entity
(KB or NIL) ID are considered coreferential.

Note that in the Month 9 Pilot Evaluation annotation, events and relations could be
linked to the reference KB, and thus some event and relation mentions from this
phase of annotation have non-NIL IDs. However, in the Scenario 1 Evaluation
annotation, events and relations were not linked to the reference KB, and thus
event and relation mentions in this phase of annotation have NIL IDs only.

6.1.3 Hypotheses

Hypothesis annotation in the AIDA pilot and in Phase 1 involved labeling answers
to evaluation queries, where the corpus was expected to contain multiple,
sometimes contradictory answers (i.e., hypotheses) to each query, with answers
appearing in different documents, modalities, and languages.

The AIDA pilot focused on facet-level queries and hypotheses, which were limited
to understanding single information elements. For instance, a facet-level query
might ask which entity perpetrated a specific attack, with facet-level hypotheses
providing all possible answers to that question present in the corpus. To create
the gold standard answer key for hypothesis evaluation, annotators worked with
reference hypotheses developed jointly with the program evaluation team, and then
labeled the relationship between each labeled information element (i.e. the labeled
relations and events) and each hypothesis. Each relation/event was judged as fully
supporting, partially supporting, or contradicting the hypothesis, where fully
supports means that the information in the hypothesis is fully captured by the
labeled event/relation. For example, given a query “Who fired on protesters and/or
police at the Maidan protests?” and a hypothesis “Members of the Berkut, a police
force loyal to the Yanukovych government, fired on protesters”, a labeled event
that describes an attack at Maidan with Berkut in the attacker role and protesters
in the victim role would be marked as fully supporting the hypothesis. An event
that describes an attack at Maidan with Berkut in the attacker role but with no
victim role specified would be marked as partially supporting, while an event that
describes an attack at Maidan with protesters in both the attacker and victim roles
would be marked as contradicting the hypothesis. 

For AIDA Phase 1 the evaluation shifted to focus on broader topic-level hypotheses
rather than facet-level hypotheses, so annotation of facet-level hypotheses was not
required. Instead, we worked with the evaluation team to develop "prevailing
theories" for each topic, which describe subject matter expert expectations about
the differing accounts or perspectives on the topic that are likely to occur in
the corpus based on prior knowledge of the scenario. While facet-level hypotheses
address a single information element, prevailing theories are often responsive to
multiple queries and represent a coherent perspective within the larger topic
narrative. For instance, one of the prevailing theories from the same topic as the
above facet-level hypothesis example was, “Snipers affiliated with Ukraine's Berkut
riot police, under the direction of Aleksandr Yakimenko (who was at the time head
of Ukraine’s SSU intelligence service) and in collaboration with Russia's
intelligence service the FSB, killed at least 53 antigovernment activists
protesting in Kiev's Independence Square (aka Maidan) on February 20, 2014, using
AK-47s and sniper rifles.” For each prevailing theory, we created a natural
language description characterizing that account of the topic, plus a list of the
events and relations, along with their arguments (entities), that would be required
as part of a knowledge graph that adequately reflects that theory; this collection
of information then constituted the gold standard for system evaluation.

6.2 Annotation Formats and Details

6.2.1 Month 9 Pilot Evaluation Annotations

The formats of Month 9 annotations are described in the
eval_table_field_descriptions.tab file in the docs/annotation/month_9_pilot_evaluation/ 
directory; the sections below provide descriptions of the content of 
each type of Month 9 Pilot Evaluation annotation file.

6.2.1.1 Mentions

There are three mentions tables for each topic: one for entities and
fillers, one for relations, and one for events. These tables are
located in the data/annotation/month_9_pilot_eval/{P101,P102,P103}
directories and are named as follows:

  Entities and fillers: {P101,P102,P103}_ent_mentions.tab
  Relations: {P101,P102,P103}_rel_mentions.tab
  Events: {P101,P102,P103}_evt_mentions.tab

These tables contain information about each annotated mention,
including a KB-id linking it to the topic-based mini-KB or a NIL-id
for mentions which were not present in the mini-KB.

Mentions are annotated only when they are deemed by the annotator to
be salient to the topic. Salience is defined as relevant to one or
more of the queries for the topic. Note that the queries referred to
here are natural language questions about the topic designed to focus
the annotators' attention on areas of the topic with expected
informational conflict. These queries are meant to ensure that the
annotations result in multiple hypotheses containing different
knowledge elements (see eval_hypothesis_info.tab in the
docs/annotation/month_9_pilot_evaluation/ directory); they are not
meant for machine consumption.

Only one mention per document element is annotated. So if a root
document (the original page seen on the internet) has 1 text, 2 image,
and 1 video document elements, an entity that was "mentioned" in all
of the document elements would have 4 mentions coming from the
annotation of this root document, one in each of the document
elements. The exception to the one mention per document element rule
is for relation or event mentions when one or more of the arguments,
attributes, or types/subtypes differ between two mentions of the same
relation/event. In such cases, one mention is created for each
occurrence of the relation/event that differs from the mentions
already annotated. For example, if a document element contains both
an assertion that MH17 was shot down by a missile and an assertion
that it was shot down by a fighter jet, two separate mentions would be
created.

Once an entity, filler, relation, or event mention that is salient 
to the topic has been identified, additional information about the
mention is captured. The information captured includes provenance
(which document element contains the mention), text extent, character
offsets, and NAM/NOM/PRO distinction for text mentions only, type (and
subtype for relations and events), a text description (called "justification"
in Month 9 Pilot Evaluation annotation) for non-text mentions (optionally
present for text mentions), and a KB link. The KB link consists of a node
ID from the topic-specific mini-KB, or in the case of a mention that is not
present in the mini-KB, a NIL-id. NIL-ids are not clustered across documents
in this annotation.

In addition, relation and event mentions can have attributes
associated with them. Relations and events can have the belief-type
attributes "hedged" and/or "not" associated with them. "Hedged" is
used to indicate uncertainty (as reported by the source, not the
annotator's certainty), and "not" is used to indicate that the source
asserts that the event or relation did not happen. A mention can have
both "hedged" and "not", which would indicate that that source
asserted that the relation or event possibly/likely did not happen.

Events can have an additional attribute of "deliberate" or
"accidental". These are used to capture assertions by the source
about whether the event was intentional or not. Annotators use one of
these attributes only when the source explicitly conveys an assertion
about intentionality, especially where such assertions are crucial to
understanding informational conflict.

Event mentions can also have a political_status attribute of
"legitpolitstatus" or "illegitpolitstatus". This attribute type is used
to capture political legitimacy of elections, and is only provided if the
legitimacy of the event (vote, election, ballot, etc.) is salient to
the topic.

Finally, relations and events have temporal information associated
with them in the form of start and end dates, when that information is
present in the document. Annotators supply as much information as
possible (minimally year, with month and/or day if available). Start
date types can be "Started On", "Started Before", or "Started After",
and end date types can be "Ended On", "Ended Before", or "Ended After".
If no date information is available annotators can choose "Unknown"
for either start or end (or both).

For events and relations, information about the arguments is found in the
slots table.

6.2.1.2 Slots

There are two slots tables per topic, one for relations and one for
events. Relation and event mentions in the mentions tables must be
looked up in the slots tables to find the arguments and fillers
involved in the relation/event. These tables are located in the
/data/annotation/month_9_pilot_eval/{P101,P102,P103} directories
and are named as follows:

  Relation slots: {P101,P102,P103}_rel_slots.tab
  Event slots: {P101,P102,P103}_evt_slots.tab

For each relation or event mention, annotators record which entities
or fillers participate in the relation/event. Relation/event
arguments/fillers must be present in the same document element as the
relation/event mention in order to be annotated for that mention. For
example, if the text says that the Russians shot down MH17 with a BUK
missile, and a video that is part of the same element indicated MH17
was shot down by a BUK missile but does not mention "Russians", then
the text mention would include the "Russians" argument, while the
video mention would not. In addition to the entity/filler id,
annotators choose a "slot type" which corresponds to something like a
role in the relation or event (e.g. a Conflict.Demonstrate event has
possible slot types of Person or Organization, Place, and Date).

For event slots only, each argument can have an attribute of "hedged"
and/or "not". The meaning of the attributes is the same as described
above in the Mentions section, but in this case its scope is at the
slot level. So if a document asserts that "it wasn't a BUK missile
that shot down MH17", the Conflict.Attack mention itself would not have
"hedged"/"not" attributes assigned to it, but the BUK missile filler
would have a "not" attribute shown in the slots table.

6.2.1.3 Hypotheses

There is a single hypothesis table for each topic, located here:

  data/annotation/month_9_pilot_eval/{P101,P102,P103}/{P101,P102,P103}_hypotheses.tab

For each hypothesis, the table provides a judgement for each relation
or event KE (knowledge element) in the mentions tables as to whether it
supports the hypothesis.

The mention IDs for each event or relation shown in the hypothesis
table must be looked up in the mentions table and the slots table to
find the details for the event or relation. Thus, the full set of KEs
that support a hypothesis will include the event and relation mentions
that were judged as fully or partially relevant, plus the entities and
fillers included in the slots table for those events/relations.

During annotation, the annotator views all relations and events
associated with each entity in the current document (root document).
For each relation and event mention, the annotator indicates whether
the hypothesis is fully supported, partially supported, or
contradicted by the relation/event mention. If the relation/event
mention is irrelevant to the hypothesis, they choose "not relevant".
Annotators are instructed to use the following criteria to choose a
relevance value for each relation/event-hypothesis pair:

  Fully supported: Given this relation/event mention, this hypothesis
  must be true.

  Partially supported: Given this relation/event mention, this
  hypothesis could be true.

  Contradicted: Given this relation/event mention, this hypothesis
  cannot be true.

  Not relevant: This relation/event mention neither supports nor
  contradicts this hypothesis.

NB: The hypotheses judgments above have the following values in field
4 of the *_hypotheses.tab files under the
data/annotation/month_9_pilot_evaluation directory of this release:

  Fully supported = "fully-relevant"
  Partially supported = "partially-relevant"
  Contradicted = "partially-relevant"
  Not relevant = "n/a"

6.2.1.4 Mini-KBs

Each topic has a "mini-KB" which includes KEs that were expected to be
salient (based on information discovered during topic development and
data scouting). Mini-KBs are located here:

  data/annotation/month_9_pilot_eval/{P101,P102,P103}/{P101,P102,P103}_mini-KB.tab

The KBs for the topics may have overlapping content; no attempt was made
to resolve "coreference" across the KBs.

The KBs have the following format:

 Col.#	Content
 1.	node_id -- unique identifier for each entry in the KB
 2.	topic_id
 3.	category -- base category of Entity, Relation, Event, or Filler
 4.	handle -- name or brief phrase to identify the entry
 5.	description -- additional information describing the entry

The node_id is the value used in linking annotated mentions in the
mentions tables to the KB.

6.2.1.5 Canonical mentions

Two files have been provided in canonical_mentions/ subdirectory of this release:

  data/annotation/month_9_pilot_evaluation/canonical_mentions/P101_P102_P103_canonical_mentions.tsv

This file contains a list of all mentions of type PER (person), ORG (organization), 
GPE (geo-political entity), LOC (location), FAC (facility), WEA (weapon), 
or VEH (vehicle) that annotators judged were canonical. Canonical
mentions in text are full, complete references, usually named,
including alternate names or transliterations. Canonical image
mentions are images that contain the entity/filler and no other
entities/fillers of the same type. Canonical shot-level video mentions
are keyframes that contain the entity/filler and no other
entities/fillers of the same type. This file contains four
tab-delimited fields: KB ID, mention ID, keyframe or image filename
(or n/a for text), and topic ID.

  data/annotation/month_9_pilot_evaluation/canonical_mentions/P101_P102_P103_named_WEA_VEH_mentions.lst

This file contains a list of all mentions (e.g. AK-100) of weapons or vehicles that
were linked to a node in a topic's mini-KB and that annotators judged
were named mentions. Note that, although all weapon and vehicle
mentions were reviewed, no vehicle mentions were judged to be names.

6.2.2 Scenario 1 Evaluation Annotations

The formats of Scenario 1 evaluation annotations are described in the
AIDA_phase_1_table_field_descriptions_v4.tab file in the 
docs/annotation/phase_1_evaluation/ directory; the sections below provide 
descriptions of the content of each type of Scenario 1 annotation file, 
with some notes about differences from the Month 9 annotations.

6.2.2.1 Mentions

There are three mentions tables for each topic: one for entities and
fillers, one for relations, and one for events. These tables are located
in the data/annotation/phase_1_evaluation/{E101,E102,E103} directories
and are named as follows:

  Entities and fillers: {E101,E102,E103}_arg_mentions.tab
  Relations: {E101,E102,E103}_rel_mentions.tab
  Events: {E101,E102,E103}_evt_mentions.tab

These tables contain information about each annotated mention.
Note that a KB-id is no longer included in the mentions.tab files, as the
KB linking information is now contained in a separate linking tab file
(see below).

Differences between the mentions.tab files in the Scenario 1 format
and the Month 9 format include:

- Entity and filler mentions are now in a file called
  TOPICID_arg_mentions.tab (rather than the ent_mentions.tab files found
  in the seedling).

- All mentions.tab files now include subtype and subsubtype fields.

- Video mentions now specify the signal type (picture or sound), and
  video and audio mentions include start and end time stamps for the
  mentions.

- Video "picture" mentions now include keyframe id; images and video
  "picture" mentions now include bounding box coordinates. NB: some
  keyframe id and bounding box coordinates have the value "EMPTY_TBD",
  as keyframe and bounding box information was planned to be added at
  a later stage of annotation.

- Arg mentions include an arg_status field with "base" or "informative"
  indicating whether the entity/filler mention is the local mention
  that occupies an arg slot in a relation or event mention ("base") or
  whether it is an additional mention of an entity that is not local
  to the event/relation mention ("informative").

- Relation and event mentions can have the attributes "hedged" and/or
  "not". Other attribute types have been eliminated as they are now
  covered by relation types.

6.2.2.2 Slots

There are two slots tables per topic, one for relations and one for
events. Relation and event mentions in the mentions tables must be
looked up in the slots tables to find the arguments and fillers
involved in the relation/event. These tables are located in the
data/annotation/phase_1_evaluation/{E101,E102,E103} directories and
are named as follows:

  Relation slots: {E101,E102,E103}_rel_slots.tab
  Event slots: {E101,E102,E103}_evt_slots.tab

Differences between the slots.tab files in the Scenario 1 format and
the Month 9 format include:

- Slot type labels use the new role labels from the AIDA annotation
  ontology, prefaced by indicators of the relation/event type and arg
  number. For example the slot type "rel022arg02sponsor" refers to the
  arg 2 sponsor role in the relation that has index number ldc_rel_022
  in the annotation ontology. To strip the slot_type to the bare role
  label, the first 11 characters can be removed, as this is a
  fixed-width preface.

- Argument mention ids have replaced the entity-level argument ids
  from the seedling annotation. The argmention_ids in the slots table
  correspond to "base" mentions in the arg_mentions table. Note that
  events which serve as arguments of sponsorship relations appear in
  the event mentions table, not the arg mentions table.

- There are also two argmention_ids in the Scenario 1 slots tables
  whose value is the string "author". These are references to the author
  of the current source document, who was not annotated as an entity
  mention.

6.2.2.3 KB Linking

The KB linking tables provide a KB ID or NIL ID for each entity,
relation, and event mention. The KB IDs refer to AIDA Scenario 1 and 2 
Reference Knowledge Base (LDC2023T10). The KB linking tables are
located here:

  data/annotation/phase_1_evaluation/{E101,E102,E103}/{E101,E102,E103}_kb_linking.tab

Note that this separate linking table means that KB IDs are not
present in the mentions.tab files.

Also note that in the case where annotators cannot disambiguate
between two or more possible KB links, multiple IDs are presented,
separated by a pipe ("|") symbol.

6.2.2.4 Prevailing Theories

The prevailing theories files provide a handful of natural language
prevailing theories about "what happened" for each topic and indicate
which KEs are required for each theory. Note that prevailing theories
are *NOT* intended to exhaustively cover the possible topic-level
hypotheses that might emerge from the data.

Prevailing theories files are located here:

  docs/annotation/phase_1_evaluation/{E101,E102,E103}_prevailing_theories_final.xlsx

Prevailing theories are in excel files, one file per topic, with one
prevailing theory per tab. Each KE within a prevailing theory has
either a KB ID or a PT clustering ID.

Each tab contains information at the top with the topic and natural
language version of the theory. Below the natural language version is
a matrix of KEs that are required to fully support the theory, where a
KE is an event or relation with all its arguments. The first column
assigns an ID number to each of the KEs, the purpose of which is to
make it easy to sort and tell which arguments go together under a
particular relation or event. For each of the KEs, one line represents
the event or relation itself, and each argument is listed on a
separate line under the event/relation.

There are two columns containing KB IDs:

- Column C (Event/Relation KB ID) contains the KB ID or clustering ID
  for the event or relation

- Column I (Item KE) contains the KB ID or clustering ID for the
  argument populating the given event or relation slot.

Entity and relation KEs that do not appear in the AIDA eval topic
KB (LDC2019E43) have PT clustering IDs formatted like PTE_E10#_###
(for prevailing theory entities) or PTR_E10#_### (for prevailing theory
relations). These IDs provide clustering information for the prevailing
theories of the given topic. These are not NIL IDs, in that they do
not correspond to any annotations in ./data, and only indicate which
KEs within a topic's prevailing theories are coreferent.

Event KEs within the prevailing theories all have NIL IDs. These IDs
may also be present in the kb_linking.tab files in ./data, meaning
they may have corresponding mention-level annotations.

In addition to the KB IDs, each line has information about the type,
subtype, and sub-subtype of each event/relation/argument as well as
expected date, start date range, end date range, and attribute
information where known.

6.2.2.5 Eval Tracer Docs

A subset of documents underwent exhaustive annotation of salient
entities. The following table lists all documents that received
this treatment by each document's root uid, and includes the language
of the annotator who performed annotation on the document (some
elements of the document may not match this designation), and the
topic for which the document was annotated.

root_uid	language	topic_id
IC0015YD8	ENG	E103
IC0015PZ4	RUS	E103
IC0015OEQ	RUS	E103
IC00169X6	UKR	E103
IC0016AE1	UKR	E103
IC001657N	ENG	E102
IC0015LZK	ENG	E102
IC0015PV3	RUS	E102
IC0015YEU	RUS	E102
IC00160V0	RUS	E102
IC001L4JT	UKR	E102
IC001L4L5	UKR	E102
IC0015Y8W	RUS	E101
IC001L32V	UKR	E101
IC001L2BF	UKR	E101
IC001L3MS	UKR	E101

7.0 Assessment

7.1 Assessment Overview

The system response files contained in this package were pooled by NIST, then
reviewed and judged by LDC annotators for the purpose of providing NIST
with a means to score submissions to the AIDA Month 9 and Phase 1
evaluations. LDC annotators performed 2 assessment tasks in support of
the AIDA Month 9 evaluation and 4 tasks in support of the AIDA Phase 1
evaluation. Annotators performed class-based and zero-hop assessment in
both Month 9 and Phase 1, and additionally performed graph and hypothesis
assessment during Phase 1. Each of these tasks is described in the
following sections.

7.1.1 Zero-Hop Assessment

In this assessment task, annotators reviewed text mentions, images,
and videos containing entities, and decided whether or not those
responses were coreferent with a particular entity in one of the AIDA
mini-KBs. For text responses marked correct, annotators also decided
if the entity mention was a name, nominal phrase, or pronoun.

For each response they judged, annotators first answered the question,
"Does this contain a mention of the reference entity?" For instance,
if a response was linked to the KB entry for Vladimir Putin, the first
question could be thought of as "Does this response contain a mention
of Vladimir Putin?" Annotators reviewed the mention in context, and
decided if the answer to this question was yes or no. A response was
assessed as correct if an entity was identifiable within the text span,
image, or video keyframe as a positive instance of the indicated KB
entity. A response was assessed as wrong if it did not contain any part
of a mention/instance of the indicated entity.

Assessors were instructed to be lenient during zero-hop assessment.
Entity mentions were not required to be exact or complete in order to
be considered correct. For instance, if a text response contained an
excessive amount of extraneous text, it was still marked correct as
long as a mention of the correct entity occurred somewhere within the
span of text. Similarly, if an image or keyframe showed only a small
part of an entity (e.g., tank treads, or the side of a person's face),
it was marked correct as long as the annotator was able to reasonably
identify that part of the image as a positive instance of the
indicated KB entity.

7.1.2 Class-Based Assessment

In this assessment task, annotators reviewed text mentions, images,
and videos containing entities, and decided whether or not those
responses contained references to a particular entity type. For text
responses marked correct, annotators also decided if the entity mention
was a name, nominal phrase, or pronoun.

For each response they judged, annotators first answered the question,
"Does this contain a mention of the specified entity type?" For instance,
if a response was marked as containing entity type PER, the first
question could be thought of as "Does this response contain a mention
of a person?" Annotators reviewed the mention in context, and decided if
the answer to this question was yes or no. A response was assessed as
correct if an entity of the specified type was identifiable within the
text span, image, or video keyframe. A response was assessed as wrong if
it did not contain any part of a mention/instance of the indicated
entity type.

As in zero-hop assessment, assessors were instructed to be lenient
during class-based assessment (see 7.1.1 above for more details).

7.1.3 Graph Assessment

In this assessment task, annotators reviewed text mentions, images,
and videos containing events and relations, and decided whether or not
those responses contained references to particular entities participating
in particular events or relations in particular roles. For events and
entities marked correct, annotators also linked those mentions to
corresponding entries in a knowledge base.

During assessment, annotators were shown a snippet of a document element
(text, image, or video) that contained a mention of an event or relation.
That snippet was called a justification. For each response they judged,
annotators first answered the question, "Does this justification contain
an entity whose role is [role] in a [relation/event type] relation or
event?" For instance, if the event type and role being assessed were
Movement.TransportArtifact.Hide and Transporter, the annotator would
answer the question, "Does this justification contain an entity whose
role is the Transporter in a Movement.TransportArtifact.Hide event?"
Annotators reviewed the mention in context, and decided if the answer to
this question was yes or no. The justification was assessed as correct
if the event or relation, as well as the argument that fills the given role,
were clearly identifiable in the text, image, or video. Further, to be
assessed as correct, the justification must have contained a mention of the entity
participating in the given event/relation as the given argument. The
justification was assessed as wrong if the event or relation and/or
the argument role were not clearly identifiable in the justification.
For correct justifications, annotators were then provided with an entity
or filler mention that may or may not have been the same mention that appeared
in the justification. Annotators then answered the question "Is this the same
entity/filler as the event/relation argument in the justification?"
If the provided entity/filler mention referred to the same entity/filler
as the event/relation argument in the justification, then the mention
was assessed as correct. If the provided entity/filler mention did not
refer to the same entity/filler as the event/relation argument in the
justification, then the mention was assessed as wrong. Finally, annotators
linked entity/filler and event mentions assessed as correct to corresponding
entries in a knowledge base, or indicated that those entities/fillers or
events did not have entries in the knowledge base.

Annotators were instructed to be lenient when assessing whether a correct
event/relation type or a correct argument occurs in the justification.
If some, but not all, of the information needed to justify the response
was contained in the justification, annotators could check the immediate
context of the justification (e.g., a few sentences around a text mention,
the caption of an image, parts of a video immediately before or after
a video justification) to confirm whether the event/relation type
or the argument was correct. Even if none of the information needed
to justify the response was contained in the justification, annotators could
still assess the response as correct if a correct response was present
in the immediate context of the justification.

7.1.4 Hypothesis Assessment

In this assessment task, annotators reviewed hypotheses, which were system-produced
groupings of events and relations and their respective arguments, that were
intended to tell a consistent story about some aspect of one of the scenario's
topics. Annotators decided whether or not the hypotheses were relevant to a given
topic, whether or not the hypotheses were coherent, and whether or not the hypotheses
were a good representation of specific predominant theories about a given topic.
There was no filtering of hypotheses through the tasks based on their assessment
in previous tasks. All hypotheses underwent all three kinds of assessment.

In the first hypothesis assessment task, Relevance Assessment, annotators reviewed the 
events and relations that made up a hypothesis and decided whether each event and relation
was fully relevant, partially relevant, or not relevant to a given topic. An event
or relation was assessed as fully relevant if all of the arguments of the event
or relation were relevant to the topic, i.e., all of the arguments provided
information about one of the topic's queries. An event or relation was assessed as
partially relevant if some but not all of the event or relation's arguments
pertained in some way to the topic. An event or relation was assessed as not
relevant if the event or relation had nothing to do with the topic at all.

In the second hypothesis assessment task, Semantic Coherence Assessment, annotators 
reviewed the events and relations that made up a hypothesis as well as the arguments
of those events and relations, and judged whether they make a coherent hypothesis.
There were three steps in Semantic Coherence Assessment. In the first step,
annotators reviewed each argument in a single event or relation within the hypothesis
and answered the question, "Are the arguments of this event or relation coherent with
all the other arguments of this event or relation?" That is, do the arguments form
a logical event or relation that doesn't contradict itself? In the next step,
annotators reviewed each argument in each event or relation within the hypothesis and
answered the question, "Are the arguments of each event or relation coherent with the
arguments of every other event or relation that make up the hypothesis?" That is,
can these arguments logically exist at the same time as the arguments of the other
events and relations in the hypothesis? In the final step, annotators reviewed each
event or relation within the hypothesis and answered the question, "Are the events
and relations coherent as a single hypothesis?" That is, can these events and
relations logically exist at the same time as each other? The event or relation was
assessed as True if it was coherent in all three steps. The event or relation was
assessed as False if it was not coherent in any of the three steps.

In the third and final hypothesis assessment task, Coverage Assessment, annotators 
reviewed the events and relations that made up a hypothesis as well as the arguments
of those events and relations, and judged how well the hypothesis matched a
topic's prevailing theories. A prevailing theory was a collection of events,
relations, and their arguments produced by LDC that together represented a particular
aspect of a topic in the scenario based on source data about the topic. For
example, a prevailing theory would be all the events, relations, and arguments
needed to represent a natural-language description of a theory like, "Riot police
shot and killed protesters in Maidan Square in Kiev on February 20, 2014." Coverage
comprised two types of matching between a hypothesis and a theory, as well an
assessment of the extent of the hypothesis's coverage of a theory. In the first
type of matching, annotators compared the arguments of a hypothesis to the arguments
of a theory, and matched the hypothesis arguments to theory arguments. Two arguments
matched if the identity and role of the hypothesis argument matched the identity and
role of the theory argument. In the second type of matching, annotators matched
the hypothesis to the theory that presented the same basic narrative as the hypothesis.
After matching, annotators decided if the hypothesis fully or partially covered
the theory, or did not cover it. A hypothesis was assessed as Fully Covered if most
or all of the hypothesis's parts were represented in the theory. A hypothesis was
assessed as Partially Covered if it presented nearly the same information without
containing conflicting information, though partial coverage necessarily entailed
that the hypothesis did not represent most or all of the theory. If the hypothesis
did not make sense, was unrelated to the topic, or presented a new theory not listed
among the prevailing theories, it was assessed as No Coverage.

7.2 Assessment Formats and Details

7.2.1 Month 9 Pilot Evaluation Assessments

The sections below provide descriptions of the content of each type of Month 9
Pilot Evaluation assessment file.

7.2.1.1 Class

The Month 9 Pilot Evaluation Class Assessment file is located here:

  data/assessment/month_9_pilot_evaluation/class/AIDA_2018_CL_KIT.tab

This file is a consolidation of 50 class-based response files, released as part of
LDC2019R05 AIDA Month 9 Pilot Eval Assessment Results, which comprise the complete
set of class-based assessments produced by LDC annotators for the AIDA M9
evaluation. In total, this file contains 7,707 assessed responses.

The class-based assessment results file contains 8 tab-delimited fields. The field
definitions are as follows:

 Col.#	Content
 1.	query_id -- Class-based query ID
 2.	type -- entity type
 3.	mention_id -- integer
      • NB: unique within query_id
 4.	source -- mention source (TEXT, VIDEO, or IMAGE)
 5.	root_uid
 6.	mention_span -- mention span in the format:
      • [Text] DocElementID:(start,0)-(end,0)
      • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
      • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 7.	assessment -- assessment of link between columns 6 & 1 (correct or wrong)
 8.	level -- mention type (nam, nom, or pro) for text mentions

The following is a summary of Month 9 Pilot Evaluation class-based
assessment results:

   COUNT | SOURCE | JUDGMENT | MENTION TYPE
    1563 | IMAGE  | correct  | -
     619 | IMAGE  | wrong    | -
    1380 | TEXT   | correct  | nam
    1737 | TEXT   | correct  | nom
      78 | TEXT   | correct  | pro
    1783 | TEXT   | wrong    | -
     304 | VIDEO  | correct  | -
     243 | VIDEO  | wrong    | -

7.2.1.2 Zero-Hop

The Month 9 Pilot Evaluation Zero-Hop Assessment response file is located here:

  data/assessment/month_9_pilot_evaluation/zero-hop/AIDA_2018_ZH_KIT.tab

This file is a consolidation of 197 zero-hop response files, released as part of
LDC2019R05 AIDA Month 9 Pilot Eval Assessment Results, which comprise the complete
set of zero-hop assessment produced by LDC annotators for the AIDA M9 evaluation.
In total, this file contains 34,488 assessed responses.

The zero-hop assessment results file contains 8 tab-delimited fields. The field
definitions are as follows:

 Col.#	Content
 1.	kb_id -- KB node ID
 2.	type -- entity type [always NIL for zero-hop files]
 3.	mention_id -- integer
      • NB: unique within kb_id
 4.	source -- mention source (TEXT, VIDEO, or IMAGE)
 5.	root_uid
 6.	mention_span -- mention span in the format:
      • [Text] DocElementID:(start,0)-(end,0)
      • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
      • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 7.	assessment -- assessment of link between columns 6 & 1 (correct or wrong)
 8.	level -- mention type (nam, nom, or pro) for text mentions

The following is a summary of Month 9 Pilot Evaluation zero-hop
assessment results:

   COUNT | SOURCE | JUDGMENT | MENTION TYPE
     299 | IMAGE  | correct  | -
     843 | IMAGE  | wrong    | -
    7536 | TEXT   | correct  | nam
    1034 | TEXT   | correct  | nom
      57 | TEXT   | correct  | pro
   23732 | TEXT   | wrong    | -
     132 | VIDEO  | correct  | -
     855 | VIDEO  | wrong    | -

7.2.2 Scenario 1 Evaluation Assessments

The sections below provide descriptions of the content of each type of Scenario 1
Evaluation assessment file.

7.2.2.1 Class

The Scenario 1 Evaluation Class Assessment file is located here:

  data/assessment/phase_1_evaluation/class/AIDA_TA1_CL_2019.txt

This file is a consolidation of 115 class-based response files, released as part
of LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set
of class-based assessments produced by LDC annotators for the AIDA Phase 1
evaluation. In total, this file contains 5,884 assessed responses.

The class-based assessment results file contains 9 tab-delimited fields. The
field definitions are as follows:

Col.#	Content
 1.	query_id -- Class-based query ID
 2.	type -- entity type/sub-type/sub-subtype
 3.	response_id -- integer
      • NB: unique within query_id
 4.	source -- mention source (TEXT, VIDEO, or IMAGE)
 5.	root_uid
 6.	mention_span -- mention span in the format:
      • [Text] DocElementID:(start,0)-(end,0)
      • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
      • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 7.	assessment -- assessment of link between columns 6 & 2 (correct or wrong)
 8.	level -- mention type (nam, nom, or pro) for TEXT responses
 9.	kb_id - KB ID or NIL ID of correct responses; for correct NIL
                singletons, this is just "NIL"

The following is a summary of Scenario 1 Evaluation class-based
assessment results:

   COUNT | SOURCE | JUDGMENT | MENTION TYPE
     164 | IMAGE  | correct  | -
      75 | IMAGE  | wrong    | -
     873 | TEXT   | correct  | nam
     860 | TEXT   | correct  | nom
     108 | TEXT   | correct  | pro
    2622 | TEXT   | wrong    | -
    1003 | VIDEO  | correct  | -
     179 | VIDEO  | wrong    | -

7.2.2.2 Zero-Hop

The Scenario 1 Evaluation Zero-Hop Assessment response file is located here:

  data/assessment/phase_1_evaluation/zero-hop/AIDA_TA1_ZH_2019.tab

This file is a consolidation of 102 zero-hop response files, released as part of
LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of
zero-hop assessments produced by LDC annotators for the AIDA Phase 1 evaluation.
In total, this file contains 5,759 assessed responses.

The zero-hop assessment results file contains 8 tab-delimited fields. The field
definitions are as follows:

 Col.#	Content
 1.	kb_id
 2.	type -- entity type [always NIL for zero-hop files]
 3.	response_id -- integer
      • NB: unique within kb_id
 4.	source -- mention source (TEXT, VIDEO, or IMAGE)
 5.	root_uid
 6.	mention_span -- mention span in the format:
      • [Text] DocElementID:(start,0)-(end,0)
      • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
      • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 7.	assessment -- assessment of link between columns 6 & 1 (correct or wrong)
 8.	level -- mention type (nam, nom, or pro) for TEXT responses

The following is a summary of Scenario 1 Evaluation zero-hop assessment results:

   COUNT | SOURCE | JUDGMENT | MENTION TYPE
       3 | IMAGE  | correct  | -
       5 | IMAGE  | wrong    | -
    4308 | TEXT   | correct  | nam
     159 | TEXT   | correct  | nom
       1 | TEXT   | correct  | pro
    1153 | TEXT   | wrong    | -
      43 | VIDEO  | correct  | -
      87 | VIDEO  | wrong    | -

7.2.2.3 Graph

The Scenario 1 Evaluation Graph Assessment file is located here:

  data/assessment/phase_1_evaluation/graph/AIDA_TA1_graph_2019.tab

This file is a consolidation of 782 graph response files, released as part of
LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of
graph assessments produced by LDC annotators for the AIDA Phase 1 evaluation.
In total, this file contains 14,984 assessed responses.

The graph assessment results file contains 13 tab-delimited fields. The field
definitions are as follows:

 Col.#	Content
 1.	query_id
 2.	response_id -- integer
      • NB: unique within query_id + root_uid + object_justification + predicate_justification
 3.	predicate -- (e.g. Conflict.Attack_Attacker)
 4.	root_uid
 5.	subject_type - SubjectType [NIL, ignored by LDC]
 6.	subject_justification -- SubjectJustification (1 span) [NIL, ignored by LDC]
 7.	object_type - ObjectType [NIL, ignored by LDC]
 8.	object_justification -- ObjectJustification - 1 span in the format:
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 9.	predicate_justification -- PredicateJustification - 1-2 semicolon-separated spans:
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 10.	assessment_1 -- Is PredicateJustification Correct? (correct or wrong)
 11.	assessment_2 -- If Column 10 is correct, is ObjectJustification (Column 8)
                        linkable to the object in PredicateJustification (Column 9)?
                        (yes or no)
 12.	object_id -- global KB ID or NIL ID for the (correct) object in Column 8;
                     for correct NIL singleton objects, this is only "NIL"
 13.	predicate_id -- global KB ID or NIL ID for the (correct) subject in Column 9
                        if the subject is an event; for correct NIL singleton event
                        subjects, this is only "NIL". Relation subjects have no KB
                        ID or NIL ID, as manual relation coref was not performed by
                        LDC assessors.

The following is a summary of Scenario 1 Evaluation graph assessment results:

   COUNT | assessment_1 | assessment_2
    7141 | wrong        | -
    1897 | correct      | no
    5946 | correct      | yes

7.2.2.4 Graph Relations

The Scenario 1 Evaluation Graph relation assessment file is located here:

  data/assessment/phase_1_evaluation/graph/relation-pool-v2.1.tab

This file contains results of the relation assessment task, wherein assessors
judged whether or not two correct relation arguments together comprised a correct
and justified relation. In total, this file contains 721 assessed responses.

The relation assessment results file contains 12 tab-delimited fields. The field
definitions are as follows:

 Col.#	Content
 1.	type -- Relation type
 2.	arg1_role -- ARG1 role label
 3.	arg1_uid -- ARG1 document ID
 4.	arg1_p-justification -- ARG1 predicate justification
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 5.	arg1_o-justification -- ARG1 object justification
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 6.	arg1_kb_id -- ARG1 KB node ID or NIL ID (assigned by LDC during assessment
                      of correctness)
 7.	arg2_role -- ARG2 role label
 8.	arg2_uid -- ARG2 document ID
 9.	arg2_p-justification -- ARG2 predicate justification
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 10.	arg2_o-justification -- ARG2 object justification
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 11.	arg2_kb_id -- ARG2 KB node ID or NIL ID (assigned by LDC during assessment
                      of correctness)
 12.	assessment -- Is ARG1 linkable to ARG2 with respect to provided relation
                      type and corresponding role labels? (yes or no)

The following is a summary of Scenario 1 Evaluation graph relation
assessment results:

   COUNT | assessment
     169 | no
     552 | yes

7.2.2.5 Hypothesis

The Scenario 1 Evaluation Hypothesis Assessment file is located here:

  data/assessment/phase_1_evaluation/hypothesis/AIDA_hypothesis.tab

The file is a consolidation of 732 hypothesis files, released as part of
LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of
TA3 hypotheses assessed by LDC annotators for the AIDA Phase 1 evaluation.
In total, this file contains 8,422 assessed responses.

The hypothesis file contains 23 tab-delimited fields. The field definitions are
as follows:

 Col.#	Content
 1.	HypothesisID
 2.	Hyp_Importance
 3.	EvtRelUniqueID
 4.	EvtRelClusterID
 5.	EvtRel-Importance
 6.	EvtRel_EdgeLabel -- (e.g. Conflict.Attack_Attacker)
 7.	ObjClusterID
 8.	EdgeID
 9.	Edge-Importance
 10.	ObjectType -- (e.g. PER.Combatant.Sniper)
 11.	ObjectHandle
 12.	PredicateJustificationConfidence
 13.	ObjectJustificationConfidence
 14.	DocID
 15.	SubjectJustification -- [NULL, ignored by LDC]
 16.	PredicateJustification
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 17.	ArgumentJustification
	  • [Text] DocElementID:(start,0)-(end,0)
	  • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
	  • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty)
 18.	EvtRelRelevance -- judgment of event/relation KE's relevance to scenario
                           topic (FullyRelevant, PartiallyRelevant, or NotRelevant)
 19.	EdgeCoherence -- judgment of edge's semantic coherence (True or False)
 20.	EvtRelCoherence -- judgment of event/relation KE's semantic coherence
                           (True or False)
 21.	CoverageOfBestMatchingPT -- judgment of hypothesis's overall coverage of
                                    best matching prevailing theory indicated in
                                    column 22, if any (FullyCovered,
                                    PartiallyCovered, or None)
 22.	BestMatchingPrevailingTheory -- hypothesis's best matching prevailing
                                        theory, if any (e.g. E102Theory5)
 23.	PrevailingTheoryMatchingArgID -- edge's best matching prevailing theory
                                         argument or arguments, if any. Formatted
                                         like e.g. E101_Theory3-KE002-evt090arg02victim-80000117
                                         with multiple matching arguments separated by '|'

8.0 Software Tools Included in this Release

8.1 Ltf2txt

A data file in ltf.xml format (as described above) can be conditioned to
recreate exactly the "raw source data" text stream (the rsd.txt file) from
which the LTF was created.  The tools described here can be used to apply that
conditioning, either to a directory or to a zip archive file containing
ltf.xml data.  In either case, the scripts validate each output rsd.txt stream
by comparing its MD5 checksum against the reference MD5 checksum of the
original rsd.txt file from which the LTF was created.  (This reference
checksum is stored as an attribute of the "DOC" element in the ltf.xml
structure; there is also an attribute that stores the character count of the
original rsd.txt file.) The tools are located here:

  tools/ltf2txt

Each script contains user documentation as part of the script content; you can
run "perldoc" to view the documentation as a typical unix man page, or you can
simply view the script content directly by whatever means to read the
documentation.  Also, running either script without any command-line arguments
will cause it to display a one-line synopsis of its usage, and then exit.

   ltf2rsd.perl -- convert ltf.xml files to rsd.txt (raw-source-data)

   ltf2ma.perl -- convert ltf.xml files to ma_tkn.txt (morpheme-segmented text)

   ltfzip2rsd.perl -- extract and convert ltf.xml files from zip archives

8.2 Twitter-processing

The executable get_tweet_by_id.rb is located under tools/twitter-processing/bin/
and can be used to download and condition twitter text to match the version
used by LDC for annotation. See tools/twitter-processing/README.md for further
information.

9.0 Documentation Included in this Release

The ./docs folder (relative to the root directory of this release)
contains a set of tab-delimited table files, pdf files,
and excel files. They are organized into annotation and assessment
subdirectories, each of which is further divided into Month 9 Pilot Evaluation
and Phase 1 Evaluation subdirectories. Each file is described in a
subsection below.

In the following, the term "asset" refers to any single "primary" data file of
any given type. Each asset has a distinct 9-character identifier. If two or
more files appear with the same 9-character file-ID, this means that they
represent different forms or derivations created from the same, single primary
data file (e.g. this is how we mark corresponding LTF.xml and PSM.xml file
pairs).

Data scouting, annotation, and related metadata are all managed with regard to
a set of "root" HTML pages (harvested by the LDC for a specified set of
topics); therefore the tables and annotations make reference to the asset-IDs
assigned to those root pages.

However, the present release does not include the original HTML text streams,
or any derived form of data corresponding to the full HTML content. As a
result, the "root" asset-IDs cited in tables and annotations are not to be
found among the inventory of data files presented in zip archives in the
"./data" directory.

Each root asset is associated with one or more "child" assets
(including images, media files, style sheets, text data presented as
ltf.xml, etc.); each child asset gets its own distinct 9-character ID.
The root-child relations are provided in "parent_children.tab" table
(9.1.1), and as part of the LDCC header content in the various "wrapped"
data file formats (as listed in section 2).

9.1 Top-level Documentation

Files in the top-level /docs directory describe data relevant to both annotation
and assessment partitions.

9.1.1 "parent_children.tab" -- Relation of Child Assets to Root HTML Pages

This file is located in the top-level docs directory here:

  docs/parent_children.tab

Each data file-ID in the set of zip archives is represented by the combination
of child_uid and child_asset_type (columns 2 and 4), along with its root UID in
column 1.

 Col.#  Content
 1.	parent_uid -- 9-character source document ID string
 2.	child_uid -- 9-character ID string for media element of source document
 3.	url -- URL for root document or for child asset
 4.	child_asset_type -- media type, represented as file type and storage format
		(e.g., .ltf.xml, .jpg.ldcc)
 5.	topic -- topic ID for which the document was annotated
 6.	lang_id -- automatically detected language, "n/a" for non-ltf assets
 7.	lang_manual -- manually selected language(s)
 8.	rel_pos -- position of this asset relative to other child assets on the page
 9.	wrapped_md5 -- md5 checksum of .ldcc formatted asset file
 10.	unwrapped_md5 -- md5 checksum of original asset data file
 11.	download_date -- download date of asset
 12.	content_date -- creation date of asset, or n/a
 13.	status_in_corpus -- "present" or "diy" - set to "diy" for assets
		associated with tweets 

Notes:

  - Because ltf and psm files have the same "child" uid and differ only in the
    file extension (.ltf.xml or .psm.xml), only the ltf files are listed in
    the parent_children.tab document.

  - The URL provided for each .ltf.xml entry in the table is the "full-page"
    URL for root document associated with the "parent_uid" value. (For other
    types of child data -- images and media -- the "url" field contains the
    specific url for that specific piece of content.)

  - Some child_uids (for images or videos) appear multiple times in the table,
    because they were found to occur identically in multiple root web pages.

  - "Derived assets" such as ltf and psm do not have a relative
    position value.

  - Topic and manually selected language data were withheld from previous versions
    of parent_children.tab to protect evaluation-sensitive information. That data
    is now provided since there is no longer a need to protect evaluation-sensitive
    information.

9.1.2 "masterShotBoundary.msb" -- Summary of Shot Boundary Segments

This file is located in the top-level docs directory here:

  docs/masterShotBoundary.msb

For each video included in the release, a set of segments was generated with
the video shot boundary detector and is listed in this file.

 Col.#	Content
 1.	keyframe_id -- Unique id constructed using the 9-character file-ID of
		       the video from which the frame was extracted and a unique
		       id for the keyframe (e.g., HC0000SPD_26, which was extracted
		       from HC0000SPD)
 2.	start_frame -- Shot start frame
 3.	end_frame -- Shot end frame
 4.	start_time -- Shot start time in seconds
 5.	end_time -- Shot end time in seconds

9.2 Annotation Documentation

9.2.1 "twitter_info.tab" -- Summary of Twitter Assets

This file is located in the annotation subdirectory of the docs directory here:

  docs/annotation/twitter_info.tab

For each tweet collected, a row listing asset uid, tweet ID, user ID, and
topic UID is included:

 Col.#	Content
 1.	uid
 2.	tweet_id -- Twitter-provided tweet ID
 3.	user_id -- Twitter-provided user ID

This file can be used with the twitter-processing utility provided in
the tools/ directory of this package to ensure that the downloaded
tweet contents match those retrieved by LDC so that any annotations
can be correctly aligned with the tweet.
 
9.2.2 Month 9 Annotation Documentation

The following documents are present in the docs/annotation/month_9_pilot_evaluation
directory of this package.

9.2.2.1 AIDA_Seedling_Annotation_Guidelines_V2.1.pdf

Version of annotation guidelines that were used to produce the Month 9
annotations in this package.

9.2.2.2 AIDA_Seedling_Ontology_Info_V7.xlsx

File with information on the Month 9 ontology types and constraints
used in annotation.

9.2.2.3 eval_hypothesis_info.tab

Four-column table providing information about each hypothesis. The four 
columns in this table are as follows:

 Col.#	Content
 1.	hypothesis_id -- Unique identifier for each hypothesis; the id
                         consists of 3 fields separated by underscores:
                         topic_query_hypothesis: P101_Q001_H001 is the
                         first hypothesis for topic P101, Query 1, and
                         P101_Q002_H001 is the first hypothesis for
                         topic P101, Query 2, etc.
 2.	topic_name -- Name for the topic the hypothesis is relevant to
 3.	query -- Natural language query the hypothesis is in response to
 4.	hypothesis -- Natural language hypothesis text

9.2.2.4 eval_topic_description.pdf

Description of each pilot eval topic and the conflicting information
types that were expected.

Note that the conflicting information contained in the topic
description is not intended to be exhaustive or to constrain the
annotation in any way. This topic description, in combination with the
eval_hypothesis_info.tab file in the same subdirectory and the mini KBs
for each topic in the data/ directory, constitute the "topic model".

9.2.2.5 eval_table_field_descriptions.tab

Description of the structure of each type of annotation table in the
data/annotation/month_9_pilot_evaluation/{P101,P102,P103} subdirectories.

This table includes information about column headers, content of each
field, and format of the contents.

9.2.3 Scenario 1 Annotation Documentation

The following documents are present in the docs/annotation/phase_1_evaluation 
directory of this package.

9.2.3.1 AIDA_Annotation_Guidelines_Quality_Control_and_Informative_Mentions_V1.0.pdf

Versions of annotation guidelines that were used to perform
Quality Control of the Scenario 1 Salient Mentions annotations
in this package.

9.2.3.2 AIDA_Annotation_Guidelines_Salient_Mentions_V1.0.pdf

Versions of annotation guidelines that were used to produce the
Scenario 1 evaluation annotations in this package.

9.2.3.3 AIDA_phase_1_table_field_descriptions_v4.tab

Description of the structure of each type of annotation table. This 
table includes information about column headers, content of each field, 
and format of the contents.

9.2.3.4 LDC_AIDAAnnotationOntology_V8.xlsx

The Scenario 1 annotation ontology.

9.2.3.5 E101_E102_E103_topic_description.pdf

Descriptions of E101, E102, and E103 topics with queries and query 
IDs. Note that the queries are meant to draw annotators' attention to 
expected points of informational conflict within the topic, but 
salience to the topic is defined more broadly than simply providing 
the answer to one of the queries. See the annotation guidelines for 
instructions provided to annotators on determining salience.

9.2.3.6 {E101,E102,E103}_prevailing_theories_final.xlsx

These three files contain prevailing theories for topics E101, E102, 
and E103 respectively.

9.3 Assessment Documentation

9.3.1 Month 9 Assessment Documentation

The following documents are present in the docs/assessment/month_9_pilot_evaluation
directory of this package.

9.3.1.1 AIDA_2018_Assessment_Guidelines_V1.0.pdf

Latest version of the guidelines that were used to produce Month 9 Evaluation
assessments.

9.3.1.2 zerohop_queries.xml

This file contains 269 zero-hop query entry points. Each entry point
contains an entity mention, its source document, and a KB node. In total,
there are 20 unique KB nodes across 269 entry points, corresponding to 20
unique entities in the AIDA P103 mini-KB. All zero-hop responses assessed
by LDC annotators were responses to one of these 20 entities.

9.3.2 Scenario 1 Assessment Documentation

The following documents are present in the docs/assessment/phase_1_evaluation
directory of this package.

9.3.2.1 AIDA_2019_Entity_Assessment_Guidelines_V1.1.pdf

Latest version of the guidelines that were used to produce the assessments
of class and zero-hop assessments during Scenario 1 Assessment.

9.3.2.2 AIDA_2019_Event_Relation_Assessment_Guidelines_V1.0.pdf

Latest version of the guidelines that were used to produce the assessments
of graph responses during Scenario 1 Assessment.

9.3.2.3 AIDA_2019_Hypothesis_Assessment_Guidelines_V1.1.pdf

Latest version of the guidelines that were used to produce the assessments
of TA3 hypotheses during Scenario 1 Assessment.

10.0 Known Issues

10.1 Month 9 Annotations

All text entity mentions should have a mention level (nam/nom/pro). However,
there are some text entity mentions with missing mention levels.

Relations should have exactly two slots annotated. However, there are several 
cases of relation mentions with only one slot annotated. There is also a case
where a relation has two slots annotated, but one slot is missing a slot_type.

Each hypothesis should have exactly one judgement for each relation and event
mention. Some hypotheses were not judged for all relation/event mentions. One
instance in which this can occur is when a relation or event has only fillers
as arguments.

In some cases, start and end dates are not in standard format (YYYY-MM-DD).

Each type.subtype combination for events and relations has a specified set of
allowable types for the entities/fillers that can occupy its slots. In some
cases, an entity with an unexpected type is included as an argument.

Each entity or event that is an argument of a relation mention or event mention
should share provenance with that relation or event mention. That is, given the
provenance UID for a relation or event mention, and given an entity (or event)
that is an argument of that relation or event mention: at least one of the
provenance UIDs associated with the mentions of that argument should be the same as
the provenance UID for the relation or event mention in question. However, there
are many cases of arguments that do not have a shared provenance with their
corresponding event or relation.

10.2 Scenario 1 Evaluation Annotations

Duplicate arg mentions -- Some arg mentions may be annotated more than
once when they appear as arguments of more than one relation/event;
that is, the same type, subtype, and sub-subtype may be applied to the
same text extent (or video/image provenance) more than once. Note that
duplicate arg mentions each have a unique argmention_id.

Missing mediamention_coordinates -- All mentions tagged in non-text
assets are expected to have mediamention_coordinates indicating where
in the asset the mention occurs. There are 876 entity mentions, 199 event
mentions, and 142 relation mentions that have "EMPTY_TBD" or "EMPTY_NA"
under mediamention_coordinates despite being tagged in non-text assets.

Orphaned base arg mentions -- There are many base arg mentions that are
not annotated as a slot in an event or relation.

Keyframe documentation missing -- Shot start frame, shot end frame,
shot start time, and shot end time are missing from the document
masterShotBoundary.msb for the keyframes IC0019NAV_77 and IC001L2RD_104.

10.3 Empty cells in some Month 9 Annotation and Scenario 1 Assessment files

Some of the Month 9 Annotation files and Scenario 1 Evaluation Assessment
files contain empty cells. These appear as a sequence of two tab characters
within a given line in a tab file (e.g. /\t\t/), or as a line-final tab
character (e.g. /\t$/). There are no line-initial empty cells (e.g. /^\t/).

Note that lines can contain multiple empty cells, and even multiple
contiguous empty cells. Care should be taken when processing these files
that empty cells are handled appropriately, especially that data from other
fields is not shifted into the empty cells.

The files in the package that contain empty cells are:
./data/annotation/month_9_pilot_evaluation/P101/P101_ent_mentions.tab
./data/annotation/month_9_pilot_evaluation/P101/P101_evt_mentions.tab
./data/annotation/month_9_pilot_evaluation/P101/P101_evt_slots.tab
./data/annotation/month_9_pilot_evaluation/P101/P101_rel_mentions.tab
./data/annotation/month_9_pilot_evaluation/P102/P102_ent_mentions.tab
./data/annotation/month_9_pilot_evaluation/P102/P102_evt_mentions.tab
./data/annotation/month_9_pilot_evaluation/P102/P102_evt_slots.tab
./data/annotation/month_9_pilot_evaluation/P102/P102_rel_mentions.tab
./data/annotation/month_9_pilot_evaluation/P102/P102_rel_slots.tab
./data/annotation/month_9_pilot_evaluation/P103/P103_ent_mentions.tab
./data/annotation/month_9_pilot_evaluation/P103/P103_evt_mentions.tab
./data/annotation/month_9_pilot_evaluation/P103/P103_evt_slots.tab
./data/annotation/month_9_pilot_evaluation/P103/P103_rel_mentions.tab
./data/assessment/phase_1_evaluation/graph/AIDA_TA1_graph_2019.tab
./data/assessment/phase_1_evaluation/zero-hop/AIDA_TA1_ZH_2019.tab

11.0 Copyright

Portions © 2003, 2015 2000.ua, © 2015 Arguments and Facts, © 2014
Associated Newspapers Ltd, © 2017 Belarus Today, ©
2017 Belarusian Hour, © 2016-2018 Bessarabia INFORM, © 2017-2018 Bird
In Flight, © 2015-2016 Cable News Network. Turner Broadcasting System,
Inc., © 2016 Censor.NET, © 2011-2015, 2017 Consortiumnews, © 2014-2016
Digital Venture LLC, © 2012, 2017 DirectPress.ru, © 2015 Elisa Group
Ltd., © 2018 Elnews.ru, © 2014 EUROMAIDAN PRESS, © 2015-2016 euronews,
© 2017 Facts and Comments, © 2016 FAN, © 2014 Forbes Media LLC, © 2014
From-UA, © 2013 gate @ Crimea – news, comments, © 2017 Gazetadaily.ru,
© 2018 GLAVRED.INFO, © 2011 Human Rights Watch,© 2013, 2017-2018 IA
REGNUM, © 2014-2015 InfoKava.com, © 2017 Information and Analytical
Agency, © 2015 InoSMI.ru,© 2014 Interfax-Ukraine, © 2009-2017 JSC
Business News Media, © 2012-2014, 2016 KM Online, LLC, © 2014-2015
Lenta.Ru LLC, © 2017 Liga Information and Analytical Center, © 2015,
2017 Lux Television and Radio Company, © 2014-2017 MIA Russia Today, ©
2016-2018 mirnews.su, © 2014 Mirror of the week, © 2017 News Front, ©
2015-2016 NEWSru.com, © 2018 Obozrevatel, © 2014-2015 PJSC Today
Multimedia, © 2017 Public Television, © 2014-2015, 2017 Radio Liberty,
© 2014-2015 RFE/RL, © 2014-2015 The Daily Beast Company LLC, ©
2014-2017 The Military Review, © 2011-2012 The Power of Truth, © 2014
The Slate Group, © 2014, 2016-2017 TSN.ua, © 2014-2017 TV-Novosti, ©
2015, 2017 Ukrainian Media Holding, © 2014, 2016 Ukrainian Media
Systems, © 2014-2015, 2017 Ukrainian Pravda, © 2015-2017 Ukrinform, ©
2014, 2017 UNIAN.NET, © 2014-2015 Vice News, © 2017 Western
Information Corporation, © 2014-2018 Zhitomir-Online, © 2018 Trustees
of the University of Pennsylvania

12.0 Contacts

Dana Delgado <foredana@ldc.upenn.edu> - AIDA Project Manager
Christopher Caruso <caruso@ldc.upenn.edu> - AIDA Tech Lead
Ann Bies <bies@ldc.upenn.edu> - AIDA Coordinator 
Kira Griffitt <kiragrif@ldc.upenn.edu> - AIDA Coordinator 

------
README created by Chris Caruso on February 3, 2023
       updated by Jeremy Getman on February 9, 2023
       updated by Stephanie Strassel September 5, 2023
       updated by Jeremy Getman on May 15, 2024
       updated by Summer Ploegman on May 14, 2025
       updated by Kira Griffitt on May 27, 2025