Corpus Title: AIDA Scenario 1 Evaluation Topic Source Data, Annotation, Assessment LDC Catalog-ID: LDC2025T13 Authors: Jennifer Tracey, Stephanie Strassel, Jeremy Getman, Ann Bies, Kira Griffitt, David Graff, Chris Caruso, Joshua Parry 1.0 Introduction This corpus was developed by the Linguistic Data Consortium for the DARPA AIDA Program and contains a multi-media collection of 10,522 documents used in the AIDA Month 9 pilot evaluation and the AIDA Final Phase 1 evaluation, annotations for 386 of those documents, and results of assessment of 77,965 responses in 1,525 of those documents. The AIDA (Active Interpretation of Disparate Alternatives) Program was designed to support development of technology that can assist in cultivating and maintaining understanding of events when there are conflicting accounts of what happened (e.g. who did what to whom and/or where and when events occurred). AIDA systems must extract entities, events, and relations from individual multimedia documents, aggregate that information across documents and languages, and produce multiple knowledge graph hypotheses that characterize the conflicting accounts that are present in the data. Each phase of the AIDA program focused on a different scenario, or broad topic area. The scenario for Phase 1 was political relations between Russia and Ukraine in the 2010s. This scenario was used for both the AIDA Month 9 pilot evaluation and the AIDA Final Phase 1 evaluation. In addition, each scenario had a set of specific subtopics within the scenario that were designated as either "practice topics" (released for use in system development) or "evaluation topics" (reserved for use in the AIDA program evaluations for each phase). The annotations and assessments contained in this release include coverage of the following three evaluation topics ('P' IDs used in Month 9 pilot annotations, 'E' IDs used in Scenario 1 evaluation annotations): P101/E101 - Suspicious Deaths and Murders in Ukraine (January-April 2015) P102/E102 - Odessa Tragedy (May 2, 2014) P103/E103 - Siege of Sloviansk and Battle of Kramatorsk (April-July 2014) 2.0 Directory Structure The directory structure and contents of the package are summarized below -- paths shown are relative to the base (root) directory of the package: ./data/source/ -- contains zip files subdivided by data type (see below) ./data/annotation/ -- contains subdirectories of annotation organized by evaluation partition and subdivided by topic ./data/assessment/ -- contains subdirectories of assessment organized by evaluation partition and subdivided by response type ./data/video_shot_boundaries/representative_frames -- contains subdirectories for each video, with any keyframe PNGs referenced in the annotation and assessment tables ./docs/ -- contains documentation about the source data, annotation, and assessment ./tools/ -- contains software for LTF data manipulation and twitter processing The "source" subdirectory of the "data" directory has a separate subdirectory for each of the following data types, and each directory contains one or more zip archives with data files of the given type; the list shows the archive-internal directory and file-extension strings used for the data files of each type: bmp/*.bmp.zip -- contains "bmp/*.bmp.ldcc" files (image data) gif/*.gif.zip -- contains "gif/*.gif.ldcc" files (image data) jpg/*.jpg.zip -- contains "jpg/*.jpg.ldcc" files (image data) mp3/*.mp3.zip -- contains "mp3/*.mp3.ldcc" files (audio data) mp4/*.mp4.zip -- contains "mp4/*.mp4.ldcc" files (typically video) png/*.png.zip -- contains "png/*.png.ldcc" files (image data) svg/*.svg.zip -- contains "svg/*.svg.ldcc" files (image data) ltf/*.ltf.zip -- contains "ltf/*.ltf.xml" (segmented/tokenized text data) psm/*.psm.zip -- contains "psm/*.psm.xml" files (companion to ltf.xml) Data types in the first group consist of original source materials presented in "ldcc wrapper" file format (see section 4.2 below). The latter group (ltf and psm) are created by LDC from source HTML data, by way of an intermediate XML reduction of the original HTML content for "root" web pages (see section 4.1 for a description of the process, and section 5 for details on the LTF and PSM file formats). The 6-character file-ID of the zip archive matches the first 6 characters of the 9-character file-IDs of the data files it contains. For example: zip archive file ./data/source/png/HC0000.png.zip contains: png/HC00000FM.png.ldcc png/HC00000FN.png.ldcc ... png/HC00009L7.png.ldcc png/HC00009L8.png.ldcc (The "ldcc" file format is explained in more detail in section 4.2 below.) Note that the number of data files per zip archive varies with the largest zip in this package containing over 4,100 files. The "video_shot_boundaries" directory contains a "representative_frames" subdirectory which contains a directory of .png images corresponding to each detected shot referenced in annotation or assessment tables. These directories are named using the 9-character file-ID of the video from which the included frames were extracted. 3.0 Content Summary 3.1 Source Data The source data was manually scouted by annotators searching for relevant material which was then collected (harvested) from various web sources. In the mini-table below, "#RtPgs" refers to the number of root HTML pages that were scouted and harvested; the other columns indicate the total number of data files of the various types extracted from those root pages. #RtPgs #Imgs #Vids #Auds 10522 28572 990 7 Note: The number of root pages in the table above includes Twitter data, which is not present in the data directories and must be downloaded from Twitter by the user. Assets associated with tweets are marked as "diy" in the status_in_corpus field of the parent_children.tab file. The number of image, video, and audio files in the table above does not include Twitter data. 3.2 Annotation Data The table below provides a summary of the number of HTML pages ("root documents") annotated for each topic and language for the Month 9 pilot evaluation. Topic Lang Docs P101 ENG 24 P101 RUS 71 P101 UKR 17 P102 ENG 41 P102 RUS 63 P102 UKR 17 P103 ENG 42 P103 RUS 91 P103 UKR 40 The table below provides a summary of the number of root documents annotated for each topic and language for the Final Phase 1 evaluation. Topic Lang Docs E101 ENG 13 E101 RUS 31 E101 UKR 16 E102 ENG 22 E102 RUS 33 E102 UKR 15 E103 ENG 26 E103 RUS 38 E103 UKR 27 3.3 Assessment Data The table below provides a summary of the number of root documents for each language from which assessment responses were sourced during the Month 9 pilot evaluation. In some cases, the language of the document was determined automatically, and in other cases, the language of the document was set by the language of the annotator who scouted the document. Lang Docs ENG 42 RUS 91 UKR 40 The table below provides a summary of the number of root documents for each language from which assessment responses were sourced during the Final Phase 1 evaluation. In some cases, the language of the document was determined automatically, and in other cases, the language of the document matches the language of the annotator who scouted the document. Lang Docs ENG 430 RUS 407 UKR 636 4.0 Data Processing and Character Normalization Most of the content was harvested from various web sources. Source documents were collected in two steps. First, a manual scouting process was used to identify specific HTML pages with relevant content for annotation. Then, an automated process was used to harvest additional HTML pages from those same web sources. Some content may have been harvested manually, or by means of ad-hoc scripted methods for sources with unusual attributes. 4.1 Treatment of Original HTML Text Content All harvested HTML content was initially converted from its original form into a relatively uniform XML format; this stage of conversion eliminated irrelevant content (menus, ads, headers, footers, etc.) and placed the content of interest into a simplified, consistent markup structure. The "homogenized" XML format then served as input for the creation of a reference "raw source data" (rsd) plain text form of the web page content; at this stage, the text was also conditioned to normalize white-space characters and to apply transliteration and/or other character normalization, as appropriate to the given language. This processing created the ltf.xml and psm.xml files for each harvested "root" web page; these file formats are described in more detail in section 5 below. 4.2 Treatment of Non-HTML Data Types: "ldcc" File Format To the fullest extent possible, all discrete resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) are stored as separate files of the given data type, and assigned separate 9-character file-IDs (the same form of ID as is used for the "root" HTML page). In order to present these attached resources in a stable and consistent way, the LDC has developed a "wrapper" or "container" file format, which presents the original data as-is, together with a specialized header block prepended to the data. The header block provides metadata about the file contents, including the MD5 checksum (for self-validation), the data type and byte count, url, and citations of source-ID and parent (HTML) file-ID. The LDCC header block always begins with a 16-byte ASCII signature, as shown between double-quotes on the following line (where "\n" represents the ASCII "newline" character 0x0A): "LDCc \n1024 \n" Note that the "1024" on the second line of the signature represents the exact byte count of the LDCC header block. (If/when this header design needs to accommodate larger quantities of metadata, the header byte count can be expanded as needed in increments of 1024 bytes. Such expansion does not arise in the present release.) Immediately after the 16-byte signature, a YAML string presents a data structure comprising the file-specific header content, expressed as a set of "key: value" pairings in UTF-8 encoding. The YAML string is padded at the end with space characters, such that when the following 8-byte string is appended, the full header block size is exactly 1024 bytes (or whatever size is stated in the initial signature): "endLDCc\n" In order to process the content of an LDCC header: - read the initial block of 1024 bytes from the *.ldcc data file - check that it begins with "LDCc \n1024 \n" and ends with "endLDCc\n" - strip off those 16- and 8-byte portions - pass the remainder of the block to a YAML parser. In order to access the original content of the data file, simply skip or remove the initial 1024 bytes. 5.0 Overview of XML Data Structures 5.1 PSM.xml -- Primary Source Markup Data The "homogenized" XML format described above preserves the minimum set of tags needed to represent the structure of the relevant text as seen by the human web-page reader. When the text content of the XML file is extracted to create the "rsd" format (which contains no markup at all), the markup structure is preserved in a separate "primary source markup" (psm.xml) file, which enumerates the structural tags in a uniform way, and indicates, by means of character offsets into the rsd.txt file, the spans of text contained within each structural markup element. For example, in a discussion-forum or web-log page, there would be a division of content into the discrete "posts" that make up the given thread, along with "quote" regions and paragraph breaks within each post. After the HTML has been reduced to uniform XML, and the tags and text of the latter format have been separated, information about each structural tag is kept in a psm.xml file, preserving the type of each relevant structural element, along with its essential attributes ("post_author", "date_time", etc.), and the character offsets of the text span comprising its content in the corresponding rsd.txt file. 5.2 LTF.xml -- Logical Text Format Data The "ltf.xml" data format is derived from rsd.txt, and contains a fully segmented and tokenized version of the text content for a given web page. Segments (sentences) and the tokens (words) are marked off by XML tags (SEG and TOKEN), with "id" attributes (which are only unique within a given XML file) and character offset attributes relative to the corresponding rsd.txt file; TOKEN tags have additional attributes to describe the nature of the given word token. The segmentation is intended to partition each text file at sentence boundaries, to the extent that these boundaries are marked explicitly by suitable punctuation in the original source data. To the extent that sentence boundaries cannot be accurately detected (due to variability or ambiguity in the source data), the segmentation process will tend to err more often on the side of missing actual sentence boundaries, and less often on the side of asserting false sentence breaks. The tokenization is intended to separate punctuation content from word content, and to segregate special categories of "words" that play particular roles in web-based text (e.g. URLs, email addresses and hashtags). To the extent that word boundaries are not explicitly marked in the source text, the LTF tokenization is intended to divide the raw-text character stream into units that correspond to "words" in the linguistic sense (i.e. basic units of lexical meaning). NB: Due to Twitter's terms of service, no Twitter content is provided in ltf. Users must download the tweets listed in the twitter_info.tab file in the docs/annotation/ directory. The twitter-processing tool provided in the tools/ directory can be used to ensure that the version of the tweet downloaded by users matches the version downloaded by LDC. 6.0 Annotations 6.1 Annotation Overview AIDA included three primary technology goals: extraction of information elements from multilingual, multimedia documents; aggregation of extracted information elements into a common semantic representation; and generation of multiple hypotheses about that information. Manual annotation of AIDA data supported development and evaluation of each component of AIDA systems. First, within-document annotation labeled scenario-relevant entities, relations, and events in the AIDA corpus. Annotators then conducted coreference annotation across documents, languages, and modalities by linking individual information elements to a shared knowledge base. Finally, annotators indicated the relationship between the set of labeled events/relations and various hypotheses about the scenario, for instance by indicating whether a given event supported the veracity of a particular scenario hypothesis. Each annotation task is described in more detail below. 6.1.1 Within-Document Annotation Within-document annotation consisted of labeling mentions of entities, relations, and events (including argument structure for events and relations) within individual multimedia documents for each AIDA language. For each event or relation subject to annotation, AIDA annotators made a number of decisions. First, each event or relation instance and each associated entity argument were anchored in document-level provenance. Annotators provided a brief text description (a word or short phrase) for each event, relation, or argument and assigned it a type from the annotation tagset (a set of labels for different types and subtypes of entities, relations, and events). Arguments were also labeled for the role they play in the event or relation. Annotators then specified any attributes associated with the event, relation, or argument (e.g. two attributes used in both the pilot and phase 1 annotation were “not”, indicating negation, and “hedged”, indicating uncertainty). Finally, relations and events were labeled for temporal information. Dates are characterized as starting or ending on, before, or after a particular date, and the date is expressed in year-month-day format, with partially populated dates possible. For the AIDA pilot annotation and Phase 1 annotation contained in this corpus, annotation was limited to events and relations relevant (i.e. salient) to a predetermined set of scenario topics. First, documents were designated as being generally relevant to a particular topic in the scenario. Next, annotators labeled all relations and events within the document associated with the topic, along with the entity mentions acting as arguments for those relations/events (i.e. slots). Events, relations, and entity arguments in the document that were not related to the specified topic were not labeled. 6.1.2 Cross-Document Annotation Cross-document coreference was necessary to support a whole-corpus understanding of events, relations, and their entities, enabling the generation of corpus-wide hypotheses. Procedurally, coreference was achieved by manually linking individual entity and event instances to a knowledge base (KB), comprising a set of informational entries drawn from GeoNames, the CIA World Leaders List, and the CIA World Factbook, supplemented with manually-created entries developed specifically for AIDA data. For the pilot annotation effort, we seeded KBs for each topic with events, relations, and entities known to be relevant to the topic and potentially present in the data. Annotators then manually linked individual event, relation, and entity instances from the documents to the KB and flagged any instances that could not be linked, in which case new KB entries were created. The AIDA Phase 1 evaluation design required a program-wide reference entity knowledge base, so we constructed a new reference KB consisting of entities known to be relevant to scenario topics along with a large number of other entities, drawn from existing KBs, whose relevance to specific AIDA topics was unknown. (The AIDA Scenario 1 and 2 Reference Knowledge Base is available as LDC2023T10.) Phase 1 coreference annotation for entities then consisted of manually linking entity instances to the reference KB. When no match is present in the KB, the entity is marked as NIL; once all KB linking is complete, all NILs are reviewed and clustered, such that multiple mentions of the same NIL entity are assigned the same unique NIL ID. Events were also manually clustered and assigned unique NIL IDs. Finally, relations were automatically clustered and assigned unique NIL IDs based on the results of manual entity clustering: relations with the same type, and whose arguments have the same argument role and contain the same entity (KB or NIL) ID are considered coreferential. Note that in the Month 9 Pilot Evaluation annotation, events and relations could be linked to the reference KB, and thus some event and relation mentions from this phase of annotation have non-NIL IDs. However, in the Scenario 1 Evaluation annotation, events and relations were not linked to the reference KB, and thus event and relation mentions in this phase of annotation have NIL IDs only. 6.1.3 Hypotheses Hypothesis annotation in the AIDA pilot and in Phase 1 involved labeling answers to evaluation queries, where the corpus was expected to contain multiple, sometimes contradictory answers (i.e., hypotheses) to each query, with answers appearing in different documents, modalities, and languages. The AIDA pilot focused on facet-level queries and hypotheses, which were limited to understanding single information elements. For instance, a facet-level query might ask which entity perpetrated a specific attack, with facet-level hypotheses providing all possible answers to that question present in the corpus. To create the gold standard answer key for hypothesis evaluation, annotators worked with reference hypotheses developed jointly with the program evaluation team, and then labeled the relationship between each labeled information element (i.e. the labeled relations and events) and each hypothesis. Each relation/event was judged as fully supporting, partially supporting, or contradicting the hypothesis, where fully supports means that the information in the hypothesis is fully captured by the labeled event/relation. For example, given a query “Who fired on protesters and/or police at the Maidan protests?” and a hypothesis “Members of the Berkut, a police force loyal to the Yanukovych government, fired on protesters”, a labeled event that describes an attack at Maidan with Berkut in the attacker role and protesters in the victim role would be marked as fully supporting the hypothesis. An event that describes an attack at Maidan with Berkut in the attacker role but with no victim role specified would be marked as partially supporting, while an event that describes an attack at Maidan with protesters in both the attacker and victim roles would be marked as contradicting the hypothesis. For AIDA Phase 1 the evaluation shifted to focus on broader topic-level hypotheses rather than facet-level hypotheses, so annotation of facet-level hypotheses was not required. Instead, we worked with the evaluation team to develop "prevailing theories" for each topic, which describe subject matter expert expectations about the differing accounts or perspectives on the topic that are likely to occur in the corpus based on prior knowledge of the scenario. While facet-level hypotheses address a single information element, prevailing theories are often responsive to multiple queries and represent a coherent perspective within the larger topic narrative. For instance, one of the prevailing theories from the same topic as the above facet-level hypothesis example was, “Snipers affiliated with Ukraine's Berkut riot police, under the direction of Aleksandr Yakimenko (who was at the time head of Ukraine’s SSU intelligence service) and in collaboration with Russia's intelligence service the FSB, killed at least 53 antigovernment activists protesting in Kiev's Independence Square (aka Maidan) on February 20, 2014, using AK-47s and sniper rifles.” For each prevailing theory, we created a natural language description characterizing that account of the topic, plus a list of the events and relations, along with their arguments (entities), that would be required as part of a knowledge graph that adequately reflects that theory; this collection of information then constituted the gold standard for system evaluation. 6.2 Annotation Formats and Details 6.2.1 Month 9 Pilot Evaluation Annotations The formats of Month 9 annotations are described in the eval_table_field_descriptions.tab file in the docs/annotation/month_9_pilot_evaluation/ directory; the sections below provide descriptions of the content of each type of Month 9 Pilot Evaluation annotation file. 6.2.1.1 Mentions There are three mentions tables for each topic: one for entities and fillers, one for relations, and one for events. These tables are located in the data/annotation/month_9_pilot_eval/{P101,P102,P103} directories and are named as follows: Entities and fillers: {P101,P102,P103}_ent_mentions.tab Relations: {P101,P102,P103}_rel_mentions.tab Events: {P101,P102,P103}_evt_mentions.tab These tables contain information about each annotated mention, including a KB-id linking it to the topic-based mini-KB or a NIL-id for mentions which were not present in the mini-KB. Mentions are annotated only when they are deemed by the annotator to be salient to the topic. Salience is defined as relevant to one or more of the queries for the topic. Note that the queries referred to here are natural language questions about the topic designed to focus the annotators' attention on areas of the topic with expected informational conflict. These queries are meant to ensure that the annotations result in multiple hypotheses containing different knowledge elements (see eval_hypothesis_info.tab in the docs/annotation/month_9_pilot_evaluation/ directory); they are not meant for machine consumption. Only one mention per document element is annotated. So if a root document (the original page seen on the internet) has 1 text, 2 image, and 1 video document elements, an entity that was "mentioned" in all of the document elements would have 4 mentions coming from the annotation of this root document, one in each of the document elements. The exception to the one mention per document element rule is for relation or event mentions when one or more of the arguments, attributes, or types/subtypes differ between two mentions of the same relation/event. In such cases, one mention is created for each occurrence of the relation/event that differs from the mentions already annotated. For example, if a document element contains both an assertion that MH17 was shot down by a missile and an assertion that it was shot down by a fighter jet, two separate mentions would be created. Once an entity, filler, relation, or event mention that is salient to the topic has been identified, additional information about the mention is captured. The information captured includes provenance (which document element contains the mention), text extent, character offsets, and NAM/NOM/PRO distinction for text mentions only, type (and subtype for relations and events), a text description (called "justification" in Month 9 Pilot Evaluation annotation) for non-text mentions (optionally present for text mentions), and a KB link. The KB link consists of a node ID from the topic-specific mini-KB, or in the case of a mention that is not present in the mini-KB, a NIL-id. NIL-ids are not clustered across documents in this annotation. In addition, relation and event mentions can have attributes associated with them. Relations and events can have the belief-type attributes "hedged" and/or "not" associated with them. "Hedged" is used to indicate uncertainty (as reported by the source, not the annotator's certainty), and "not" is used to indicate that the source asserts that the event or relation did not happen. A mention can have both "hedged" and "not", which would indicate that that source asserted that the relation or event possibly/likely did not happen. Events can have an additional attribute of "deliberate" or "accidental". These are used to capture assertions by the source about whether the event was intentional or not. Annotators use one of these attributes only when the source explicitly conveys an assertion about intentionality, especially where such assertions are crucial to understanding informational conflict. Event mentions can also have a political_status attribute of "legitpolitstatus" or "illegitpolitstatus". This attribute type is used to capture political legitimacy of elections, and is only provided if the legitimacy of the event (vote, election, ballot, etc.) is salient to the topic. Finally, relations and events have temporal information associated with them in the form of start and end dates, when that information is present in the document. Annotators supply as much information as possible (minimally year, with month and/or day if available). Start date types can be "Started On", "Started Before", or "Started After", and end date types can be "Ended On", "Ended Before", or "Ended After". If no date information is available annotators can choose "Unknown" for either start or end (or both). For events and relations, information about the arguments is found in the slots table. 6.2.1.2 Slots There are two slots tables per topic, one for relations and one for events. Relation and event mentions in the mentions tables must be looked up in the slots tables to find the arguments and fillers involved in the relation/event. These tables are located in the /data/annotation/month_9_pilot_eval/{P101,P102,P103} directories and are named as follows: Relation slots: {P101,P102,P103}_rel_slots.tab Event slots: {P101,P102,P103}_evt_slots.tab For each relation or event mention, annotators record which entities or fillers participate in the relation/event. Relation/event arguments/fillers must be present in the same document element as the relation/event mention in order to be annotated for that mention. For example, if the text says that the Russians shot down MH17 with a BUK missile, and a video that is part of the same element indicated MH17 was shot down by a BUK missile but does not mention "Russians", then the text mention would include the "Russians" argument, while the video mention would not. In addition to the entity/filler id, annotators choose a "slot type" which corresponds to something like a role in the relation or event (e.g. a Conflict.Demonstrate event has possible slot types of Person or Organization, Place, and Date). For event slots only, each argument can have an attribute of "hedged" and/or "not". The meaning of the attributes is the same as described above in the Mentions section, but in this case its scope is at the slot level. So if a document asserts that "it wasn't a BUK missile that shot down MH17", the Conflict.Attack mention itself would not have "hedged"/"not" attributes assigned to it, but the BUK missile filler would have a "not" attribute shown in the slots table. 6.2.1.3 Hypotheses There is a single hypothesis table for each topic, located here: data/annotation/month_9_pilot_eval/{P101,P102,P103}/{P101,P102,P103}_hypotheses.tab For each hypothesis, the table provides a judgement for each relation or event KE (knowledge element) in the mentions tables as to whether it supports the hypothesis. The mention IDs for each event or relation shown in the hypothesis table must be looked up in the mentions table and the slots table to find the details for the event or relation. Thus, the full set of KEs that support a hypothesis will include the event and relation mentions that were judged as fully or partially relevant, plus the entities and fillers included in the slots table for those events/relations. During annotation, the annotator views all relations and events associated with each entity in the current document (root document). For each relation and event mention, the annotator indicates whether the hypothesis is fully supported, partially supported, or contradicted by the relation/event mention. If the relation/event mention is irrelevant to the hypothesis, they choose "not relevant". Annotators are instructed to use the following criteria to choose a relevance value for each relation/event-hypothesis pair: Fully supported: Given this relation/event mention, this hypothesis must be true. Partially supported: Given this relation/event mention, this hypothesis could be true. Contradicted: Given this relation/event mention, this hypothesis cannot be true. Not relevant: This relation/event mention neither supports nor contradicts this hypothesis. NB: The hypotheses judgments above have the following values in field 4 of the *_hypotheses.tab files under the data/annotation/month_9_pilot_evaluation directory of this release: Fully supported = "fully-relevant" Partially supported = "partially-relevant" Contradicted = "partially-relevant" Not relevant = "n/a" 6.2.1.4 Mini-KBs Each topic has a "mini-KB" which includes KEs that were expected to be salient (based on information discovered during topic development and data scouting). Mini-KBs are located here: data/annotation/month_9_pilot_eval/{P101,P102,P103}/{P101,P102,P103}_mini-KB.tab The KBs for the topics may have overlapping content; no attempt was made to resolve "coreference" across the KBs. The KBs have the following format: Col.# Content 1. node_id -- unique identifier for each entry in the KB 2. topic_id 3. category -- base category of Entity, Relation, Event, or Filler 4. handle -- name or brief phrase to identify the entry 5. description -- additional information describing the entry The node_id is the value used in linking annotated mentions in the mentions tables to the KB. 6.2.1.5 Canonical mentions Two files have been provided in canonical_mentions/ subdirectory of this release: data/annotation/month_9_pilot_evaluation/canonical_mentions/P101_P102_P103_canonical_mentions.tsv This file contains a list of all mentions of type PER (person), ORG (organization), GPE (geo-political entity), LOC (location), FAC (facility), WEA (weapon), or VEH (vehicle) that annotators judged were canonical. Canonical mentions in text are full, complete references, usually named, including alternate names or transliterations. Canonical image mentions are images that contain the entity/filler and no other entities/fillers of the same type. Canonical shot-level video mentions are keyframes that contain the entity/filler and no other entities/fillers of the same type. This file contains four tab-delimited fields: KB ID, mention ID, keyframe or image filename (or n/a for text), and topic ID. data/annotation/month_9_pilot_evaluation/canonical_mentions/P101_P102_P103_named_WEA_VEH_mentions.lst This file contains a list of all mentions (e.g. AK-100) of weapons or vehicles that were linked to a node in a topic's mini-KB and that annotators judged were named mentions. Note that, although all weapon and vehicle mentions were reviewed, no vehicle mentions were judged to be names. 6.2.2 Scenario 1 Evaluation Annotations The formats of Scenario 1 evaluation annotations are described in the AIDA_phase_1_table_field_descriptions_v4.tab file in the docs/annotation/phase_1_evaluation/ directory; the sections below provide descriptions of the content of each type of Scenario 1 annotation file, with some notes about differences from the Month 9 annotations. 6.2.2.1 Mentions There are three mentions tables for each topic: one for entities and fillers, one for relations, and one for events. These tables are located in the data/annotation/phase_1_evaluation/{E101,E102,E103} directories and are named as follows: Entities and fillers: {E101,E102,E103}_arg_mentions.tab Relations: {E101,E102,E103}_rel_mentions.tab Events: {E101,E102,E103}_evt_mentions.tab These tables contain information about each annotated mention. Note that a KB-id is no longer included in the mentions.tab files, as the KB linking information is now contained in a separate linking tab file (see below). Differences between the mentions.tab files in the Scenario 1 format and the Month 9 format include: - Entity and filler mentions are now in a file called TOPICID_arg_mentions.tab (rather than the ent_mentions.tab files found in the seedling). - All mentions.tab files now include subtype and subsubtype fields. - Video mentions now specify the signal type (picture or sound), and video and audio mentions include start and end time stamps for the mentions. - Video "picture" mentions now include keyframe id; images and video "picture" mentions now include bounding box coordinates. NB: some keyframe id and bounding box coordinates have the value "EMPTY_TBD", as keyframe and bounding box information was planned to be added at a later stage of annotation. - Arg mentions include an arg_status field with "base" or "informative" indicating whether the entity/filler mention is the local mention that occupies an arg slot in a relation or event mention ("base") or whether it is an additional mention of an entity that is not local to the event/relation mention ("informative"). - Relation and event mentions can have the attributes "hedged" and/or "not". Other attribute types have been eliminated as they are now covered by relation types. 6.2.2.2 Slots There are two slots tables per topic, one for relations and one for events. Relation and event mentions in the mentions tables must be looked up in the slots tables to find the arguments and fillers involved in the relation/event. These tables are located in the data/annotation/phase_1_evaluation/{E101,E102,E103} directories and are named as follows: Relation slots: {E101,E102,E103}_rel_slots.tab Event slots: {E101,E102,E103}_evt_slots.tab Differences between the slots.tab files in the Scenario 1 format and the Month 9 format include: - Slot type labels use the new role labels from the AIDA annotation ontology, prefaced by indicators of the relation/event type and arg number. For example the slot type "rel022arg02sponsor" refers to the arg 2 sponsor role in the relation that has index number ldc_rel_022 in the annotation ontology. To strip the slot_type to the bare role label, the first 11 characters can be removed, as this is a fixed-width preface. - Argument mention ids have replaced the entity-level argument ids from the seedling annotation. The argmention_ids in the slots table correspond to "base" mentions in the arg_mentions table. Note that events which serve as arguments of sponsorship relations appear in the event mentions table, not the arg mentions table. - There are also two argmention_ids in the Scenario 1 slots tables whose value is the string "author". These are references to the author of the current source document, who was not annotated as an entity mention. 6.2.2.3 KB Linking The KB linking tables provide a KB ID or NIL ID for each entity, relation, and event mention. The KB IDs refer to AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10). The KB linking tables are located here: data/annotation/phase_1_evaluation/{E101,E102,E103}/{E101,E102,E103}_kb_linking.tab Note that this separate linking table means that KB IDs are not present in the mentions.tab files. Also note that in the case where annotators cannot disambiguate between two or more possible KB links, multiple IDs are presented, separated by a pipe ("|") symbol. 6.2.2.4 Prevailing Theories The prevailing theories files provide a handful of natural language prevailing theories about "what happened" for each topic and indicate which KEs are required for each theory. Note that prevailing theories are *NOT* intended to exhaustively cover the possible topic-level hypotheses that might emerge from the data. Prevailing theories files are located here: docs/annotation/phase_1_evaluation/{E101,E102,E103}_prevailing_theories_final.xlsx Prevailing theories are in excel files, one file per topic, with one prevailing theory per tab. Each KE within a prevailing theory has either a KB ID or a PT clustering ID. Each tab contains information at the top with the topic and natural language version of the theory. Below the natural language version is a matrix of KEs that are required to fully support the theory, where a KE is an event or relation with all its arguments. The first column assigns an ID number to each of the KEs, the purpose of which is to make it easy to sort and tell which arguments go together under a particular relation or event. For each of the KEs, one line represents the event or relation itself, and each argument is listed on a separate line under the event/relation. There are two columns containing KB IDs: - Column C (Event/Relation KB ID) contains the KB ID or clustering ID for the event or relation - Column I (Item KE) contains the KB ID or clustering ID for the argument populating the given event or relation slot. Entity and relation KEs that do not appear in the AIDA eval topic KB (LDC2019E43) have PT clustering IDs formatted like PTE_E10#_### (for prevailing theory entities) or PTR_E10#_### (for prevailing theory relations). These IDs provide clustering information for the prevailing theories of the given topic. These are not NIL IDs, in that they do not correspond to any annotations in ./data, and only indicate which KEs within a topic's prevailing theories are coreferent. Event KEs within the prevailing theories all have NIL IDs. These IDs may also be present in the kb_linking.tab files in ./data, meaning they may have corresponding mention-level annotations. In addition to the KB IDs, each line has information about the type, subtype, and sub-subtype of each event/relation/argument as well as expected date, start date range, end date range, and attribute information where known. 6.2.2.5 Eval Tracer Docs A subset of documents underwent exhaustive annotation of salient entities. The following table lists all documents that received this treatment by each document's root uid, and includes the language of the annotator who performed annotation on the document (some elements of the document may not match this designation), and the topic for which the document was annotated. root_uid language topic_id IC0015YD8 ENG E103 IC0015PZ4 RUS E103 IC0015OEQ RUS E103 IC00169X6 UKR E103 IC0016AE1 UKR E103 IC001657N ENG E102 IC0015LZK ENG E102 IC0015PV3 RUS E102 IC0015YEU RUS E102 IC00160V0 RUS E102 IC001L4JT UKR E102 IC001L4L5 UKR E102 IC0015Y8W RUS E101 IC001L32V UKR E101 IC001L2BF UKR E101 IC001L3MS UKR E101 7.0 Assessment 7.1 Assessment Overview The system response files contained in this package were pooled by NIST, then reviewed and judged by LDC annotators for the purpose of providing NIST with a means to score submissions to the AIDA Month 9 and Phase 1 evaluations. LDC annotators performed 2 assessment tasks in support of the AIDA Month 9 evaluation and 4 tasks in support of the AIDA Phase 1 evaluation. Annotators performed class-based and zero-hop assessment in both Month 9 and Phase 1, and additionally performed graph and hypothesis assessment during Phase 1. Each of these tasks is described in the following sections. 7.1.1 Zero-Hop Assessment In this assessment task, annotators reviewed text mentions, images, and videos containing entities, and decided whether or not those responses were coreferent with a particular entity in one of the AIDA mini-KBs. For text responses marked correct, annotators also decided if the entity mention was a name, nominal phrase, or pronoun. For each response they judged, annotators first answered the question, "Does this contain a mention of the reference entity?" For instance, if a response was linked to the KB entry for Vladimir Putin, the first question could be thought of as "Does this response contain a mention of Vladimir Putin?" Annotators reviewed the mention in context, and decided if the answer to this question was yes or no. A response was assessed as correct if an entity was identifiable within the text span, image, or video keyframe as a positive instance of the indicated KB entity. A response was assessed as wrong if it did not contain any part of a mention/instance of the indicated entity. Assessors were instructed to be lenient during zero-hop assessment. Entity mentions were not required to be exact or complete in order to be considered correct. For instance, if a text response contained an excessive amount of extraneous text, it was still marked correct as long as a mention of the correct entity occurred somewhere within the span of text. Similarly, if an image or keyframe showed only a small part of an entity (e.g., tank treads, or the side of a person's face), it was marked correct as long as the annotator was able to reasonably identify that part of the image as a positive instance of the indicated KB entity. 7.1.2 Class-Based Assessment In this assessment task, annotators reviewed text mentions, images, and videos containing entities, and decided whether or not those responses contained references to a particular entity type. For text responses marked correct, annotators also decided if the entity mention was a name, nominal phrase, or pronoun. For each response they judged, annotators first answered the question, "Does this contain a mention of the specified entity type?" For instance, if a response was marked as containing entity type PER, the first question could be thought of as "Does this response contain a mention of a person?" Annotators reviewed the mention in context, and decided if the answer to this question was yes or no. A response was assessed as correct if an entity of the specified type was identifiable within the text span, image, or video keyframe. A response was assessed as wrong if it did not contain any part of a mention/instance of the indicated entity type. As in zero-hop assessment, assessors were instructed to be lenient during class-based assessment (see 7.1.1 above for more details). 7.1.3 Graph Assessment In this assessment task, annotators reviewed text mentions, images, and videos containing events and relations, and decided whether or not those responses contained references to particular entities participating in particular events or relations in particular roles. For events and entities marked correct, annotators also linked those mentions to corresponding entries in a knowledge base. During assessment, annotators were shown a snippet of a document element (text, image, or video) that contained a mention of an event or relation. That snippet was called a justification. For each response they judged, annotators first answered the question, "Does this justification contain an entity whose role is [role] in a [relation/event type] relation or event?" For instance, if the event type and role being assessed were Movement.TransportArtifact.Hide and Transporter, the annotator would answer the question, "Does this justification contain an entity whose role is the Transporter in a Movement.TransportArtifact.Hide event?" Annotators reviewed the mention in context, and decided if the answer to this question was yes or no. The justification was assessed as correct if the event or relation, as well as the argument that fills the given role, were clearly identifiable in the text, image, or video. Further, to be assessed as correct, the justification must have contained a mention of the entity participating in the given event/relation as the given argument. The justification was assessed as wrong if the event or relation and/or the argument role were not clearly identifiable in the justification. For correct justifications, annotators were then provided with an entity or filler mention that may or may not have been the same mention that appeared in the justification. Annotators then answered the question "Is this the same entity/filler as the event/relation argument in the justification?" If the provided entity/filler mention referred to the same entity/filler as the event/relation argument in the justification, then the mention was assessed as correct. If the provided entity/filler mention did not refer to the same entity/filler as the event/relation argument in the justification, then the mention was assessed as wrong. Finally, annotators linked entity/filler and event mentions assessed as correct to corresponding entries in a knowledge base, or indicated that those entities/fillers or events did not have entries in the knowledge base. Annotators were instructed to be lenient when assessing whether a correct event/relation type or a correct argument occurs in the justification. If some, but not all, of the information needed to justify the response was contained in the justification, annotators could check the immediate context of the justification (e.g., a few sentences around a text mention, the caption of an image, parts of a video immediately before or after a video justification) to confirm whether the event/relation type or the argument was correct. Even if none of the information needed to justify the response was contained in the justification, annotators could still assess the response as correct if a correct response was present in the immediate context of the justification. 7.1.4 Hypothesis Assessment In this assessment task, annotators reviewed hypotheses, which were system-produced groupings of events and relations and their respective arguments, that were intended to tell a consistent story about some aspect of one of the scenario's topics. Annotators decided whether or not the hypotheses were relevant to a given topic, whether or not the hypotheses were coherent, and whether or not the hypotheses were a good representation of specific predominant theories about a given topic. There was no filtering of hypotheses through the tasks based on their assessment in previous tasks. All hypotheses underwent all three kinds of assessment. In the first hypothesis assessment task, Relevance Assessment, annotators reviewed the events and relations that made up a hypothesis and decided whether each event and relation was fully relevant, partially relevant, or not relevant to a given topic. An event or relation was assessed as fully relevant if all of the arguments of the event or relation were relevant to the topic, i.e., all of the arguments provided information about one of the topic's queries. An event or relation was assessed as partially relevant if some but not all of the event or relation's arguments pertained in some way to the topic. An event or relation was assessed as not relevant if the event or relation had nothing to do with the topic at all. In the second hypothesis assessment task, Semantic Coherence Assessment, annotators reviewed the events and relations that made up a hypothesis as well as the arguments of those events and relations, and judged whether they make a coherent hypothesis. There were three steps in Semantic Coherence Assessment. In the first step, annotators reviewed each argument in a single event or relation within the hypothesis and answered the question, "Are the arguments of this event or relation coherent with all the other arguments of this event or relation?" That is, do the arguments form a logical event or relation that doesn't contradict itself? In the next step, annotators reviewed each argument in each event or relation within the hypothesis and answered the question, "Are the arguments of each event or relation coherent with the arguments of every other event or relation that make up the hypothesis?" That is, can these arguments logically exist at the same time as the arguments of the other events and relations in the hypothesis? In the final step, annotators reviewed each event or relation within the hypothesis and answered the question, "Are the events and relations coherent as a single hypothesis?" That is, can these events and relations logically exist at the same time as each other? The event or relation was assessed as True if it was coherent in all three steps. The event or relation was assessed as False if it was not coherent in any of the three steps. In the third and final hypothesis assessment task, Coverage Assessment, annotators reviewed the events and relations that made up a hypothesis as well as the arguments of those events and relations, and judged how well the hypothesis matched a topic's prevailing theories. A prevailing theory was a collection of events, relations, and their arguments produced by LDC that together represented a particular aspect of a topic in the scenario based on source data about the topic. For example, a prevailing theory would be all the events, relations, and arguments needed to represent a natural-language description of a theory like, "Riot police shot and killed protesters in Maidan Square in Kiev on February 20, 2014." Coverage comprised two types of matching between a hypothesis and a theory, as well an assessment of the extent of the hypothesis's coverage of a theory. In the first type of matching, annotators compared the arguments of a hypothesis to the arguments of a theory, and matched the hypothesis arguments to theory arguments. Two arguments matched if the identity and role of the hypothesis argument matched the identity and role of the theory argument. In the second type of matching, annotators matched the hypothesis to the theory that presented the same basic narrative as the hypothesis. After matching, annotators decided if the hypothesis fully or partially covered the theory, or did not cover it. A hypothesis was assessed as Fully Covered if most or all of the hypothesis's parts were represented in the theory. A hypothesis was assessed as Partially Covered if it presented nearly the same information without containing conflicting information, though partial coverage necessarily entailed that the hypothesis did not represent most or all of the theory. If the hypothesis did not make sense, was unrelated to the topic, or presented a new theory not listed among the prevailing theories, it was assessed as No Coverage. 7.2 Assessment Formats and Details 7.2.1 Month 9 Pilot Evaluation Assessments The sections below provide descriptions of the content of each type of Month 9 Pilot Evaluation assessment file. 7.2.1.1 Class The Month 9 Pilot Evaluation Class Assessment file is located here: data/assessment/month_9_pilot_evaluation/class/AIDA_2018_CL_KIT.tab This file is a consolidation of 50 class-based response files, released as part of LDC2019R05 AIDA Month 9 Pilot Eval Assessment Results, which comprise the complete set of class-based assessments produced by LDC annotators for the AIDA M9 evaluation. In total, this file contains 7,707 assessed responses. The class-based assessment results file contains 8 tab-delimited fields. The field definitions are as follows: Col.# Content 1. query_id -- Class-based query ID 2. type -- entity type 3. mention_id -- integer • NB: unique within query_id 4. source -- mention source (TEXT, VIDEO, or IMAGE) 5. root_uid 6. mention_span -- mention span in the format: • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 7. assessment -- assessment of link between columns 6 & 1 (correct or wrong) 8. level -- mention type (nam, nom, or pro) for text mentions The following is a summary of Month 9 Pilot Evaluation class-based assessment results: COUNT | SOURCE | JUDGMENT | MENTION TYPE 1563 | IMAGE | correct | - 619 | IMAGE | wrong | - 1380 | TEXT | correct | nam 1737 | TEXT | correct | nom 78 | TEXT | correct | pro 1783 | TEXT | wrong | - 304 | VIDEO | correct | - 243 | VIDEO | wrong | - 7.2.1.2 Zero-Hop The Month 9 Pilot Evaluation Zero-Hop Assessment response file is located here: data/assessment/month_9_pilot_evaluation/zero-hop/AIDA_2018_ZH_KIT.tab This file is a consolidation of 197 zero-hop response files, released as part of LDC2019R05 AIDA Month 9 Pilot Eval Assessment Results, which comprise the complete set of zero-hop assessment produced by LDC annotators for the AIDA M9 evaluation. In total, this file contains 34,488 assessed responses. The zero-hop assessment results file contains 8 tab-delimited fields. The field definitions are as follows: Col.# Content 1. kb_id -- KB node ID 2. type -- entity type [always NIL for zero-hop files] 3. mention_id -- integer • NB: unique within kb_id 4. source -- mention source (TEXT, VIDEO, or IMAGE) 5. root_uid 6. mention_span -- mention span in the format: • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 7. assessment -- assessment of link between columns 6 & 1 (correct or wrong) 8. level -- mention type (nam, nom, or pro) for text mentions The following is a summary of Month 9 Pilot Evaluation zero-hop assessment results: COUNT | SOURCE | JUDGMENT | MENTION TYPE 299 | IMAGE | correct | - 843 | IMAGE | wrong | - 7536 | TEXT | correct | nam 1034 | TEXT | correct | nom 57 | TEXT | correct | pro 23732 | TEXT | wrong | - 132 | VIDEO | correct | - 855 | VIDEO | wrong | - 7.2.2 Scenario 1 Evaluation Assessments The sections below provide descriptions of the content of each type of Scenario 1 Evaluation assessment file. 7.2.2.1 Class The Scenario 1 Evaluation Class Assessment file is located here: data/assessment/phase_1_evaluation/class/AIDA_TA1_CL_2019.txt This file is a consolidation of 115 class-based response files, released as part of LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of class-based assessments produced by LDC annotators for the AIDA Phase 1 evaluation. In total, this file contains 5,884 assessed responses. The class-based assessment results file contains 9 tab-delimited fields. The field definitions are as follows: Col.# Content 1. query_id -- Class-based query ID 2. type -- entity type/sub-type/sub-subtype 3. response_id -- integer • NB: unique within query_id 4. source -- mention source (TEXT, VIDEO, or IMAGE) 5. root_uid 6. mention_span -- mention span in the format: • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 7. assessment -- assessment of link between columns 6 & 2 (correct or wrong) 8. level -- mention type (nam, nom, or pro) for TEXT responses 9. kb_id - KB ID or NIL ID of correct responses; for correct NIL singletons, this is just "NIL" The following is a summary of Scenario 1 Evaluation class-based assessment results: COUNT | SOURCE | JUDGMENT | MENTION TYPE 164 | IMAGE | correct | - 75 | IMAGE | wrong | - 873 | TEXT | correct | nam 860 | TEXT | correct | nom 108 | TEXT | correct | pro 2622 | TEXT | wrong | - 1003 | VIDEO | correct | - 179 | VIDEO | wrong | - 7.2.2.2 Zero-Hop The Scenario 1 Evaluation Zero-Hop Assessment response file is located here: data/assessment/phase_1_evaluation/zero-hop/AIDA_TA1_ZH_2019.tab This file is a consolidation of 102 zero-hop response files, released as part of LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of zero-hop assessments produced by LDC annotators for the AIDA Phase 1 evaluation. In total, this file contains 5,759 assessed responses. The zero-hop assessment results file contains 8 tab-delimited fields. The field definitions are as follows: Col.# Content 1. kb_id 2. type -- entity type [always NIL for zero-hop files] 3. response_id -- integer • NB: unique within kb_id 4. source -- mention source (TEXT, VIDEO, or IMAGE) 5. root_uid 6. mention_span -- mention span in the format: • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 7. assessment -- assessment of link between columns 6 & 1 (correct or wrong) 8. level -- mention type (nam, nom, or pro) for TEXT responses The following is a summary of Scenario 1 Evaluation zero-hop assessment results: COUNT | SOURCE | JUDGMENT | MENTION TYPE 3 | IMAGE | correct | - 5 | IMAGE | wrong | - 4308 | TEXT | correct | nam 159 | TEXT | correct | nom 1 | TEXT | correct | pro 1153 | TEXT | wrong | - 43 | VIDEO | correct | - 87 | VIDEO | wrong | - 7.2.2.3 Graph The Scenario 1 Evaluation Graph Assessment file is located here: data/assessment/phase_1_evaluation/graph/AIDA_TA1_graph_2019.tab This file is a consolidation of 782 graph response files, released as part of LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of graph assessments produced by LDC annotators for the AIDA Phase 1 evaluation. In total, this file contains 14,984 assessed responses. The graph assessment results file contains 13 tab-delimited fields. The field definitions are as follows: Col.# Content 1. query_id 2. response_id -- integer • NB: unique within query_id + root_uid + object_justification + predicate_justification 3. predicate -- (e.g. Conflict.Attack_Attacker) 4. root_uid 5. subject_type - SubjectType [NIL, ignored by LDC] 6. subject_justification -- SubjectJustification (1 span) [NIL, ignored by LDC] 7. object_type - ObjectType [NIL, ignored by LDC] 8. object_justification -- ObjectJustification - 1 span in the format: • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 9. predicate_justification -- PredicateJustification - 1-2 semicolon-separated spans: • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 10. assessment_1 -- Is PredicateJustification Correct? (correct or wrong) 11. assessment_2 -- If Column 10 is correct, is ObjectJustification (Column 8) linkable to the object in PredicateJustification (Column 9)? (yes or no) 12. object_id -- global KB ID or NIL ID for the (correct) object in Column 8; for correct NIL singleton objects, this is only "NIL" 13. predicate_id -- global KB ID or NIL ID for the (correct) subject in Column 9 if the subject is an event; for correct NIL singleton event subjects, this is only "NIL". Relation subjects have no KB ID or NIL ID, as manual relation coref was not performed by LDC assessors. The following is a summary of Scenario 1 Evaluation graph assessment results: COUNT | assessment_1 | assessment_2 7141 | wrong | - 1897 | correct | no 5946 | correct | yes 7.2.2.4 Graph Relations The Scenario 1 Evaluation Graph relation assessment file is located here: data/assessment/phase_1_evaluation/graph/relation-pool-v2.1.tab This file contains results of the relation assessment task, wherein assessors judged whether or not two correct relation arguments together comprised a correct and justified relation. In total, this file contains 721 assessed responses. The relation assessment results file contains 12 tab-delimited fields. The field definitions are as follows: Col.# Content 1. type -- Relation type 2. arg1_role -- ARG1 role label 3. arg1_uid -- ARG1 document ID 4. arg1_p-justification -- ARG1 predicate justification • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 5. arg1_o-justification -- ARG1 object justification • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 6. arg1_kb_id -- ARG1 KB node ID or NIL ID (assigned by LDC during assessment of correctness) 7. arg2_role -- ARG2 role label 8. arg2_uid -- ARG2 document ID 9. arg2_p-justification -- ARG2 predicate justification • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 10. arg2_o-justification -- ARG2 object justification • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 11. arg2_kb_id -- ARG2 KB node ID or NIL ID (assigned by LDC during assessment of correctness) 12. assessment -- Is ARG1 linkable to ARG2 with respect to provided relation type and corresponding role labels? (yes or no) The following is a summary of Scenario 1 Evaluation graph relation assessment results: COUNT | assessment 169 | no 552 | yes 7.2.2.5 Hypothesis The Scenario 1 Evaluation Hypothesis Assessment file is located here: data/assessment/phase_1_evaluation/hypothesis/AIDA_hypothesis.tab The file is a consolidation of 732 hypothesis files, released as part of LDC2019R30 AIDA Phase 1 Assessment Results, which comprise the complete set of TA3 hypotheses assessed by LDC annotators for the AIDA Phase 1 evaluation. In total, this file contains 8,422 assessed responses. The hypothesis file contains 23 tab-delimited fields. The field definitions are as follows: Col.# Content 1. HypothesisID 2. Hyp_Importance 3. EvtRelUniqueID 4. EvtRelClusterID 5. EvtRel-Importance 6. EvtRel_EdgeLabel -- (e.g. Conflict.Attack_Attacker) 7. ObjClusterID 8. EdgeID 9. Edge-Importance 10. ObjectType -- (e.g. PER.Combatant.Sniper) 11. ObjectHandle 12. PredicateJustificationConfidence 13. ObjectJustificationConfidence 14. DocID 15. SubjectJustification -- [NULL, ignored by LDC] 16. PredicateJustification • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 17. ArgumentJustification • [Text] DocElementID:(start,0)-(end,0) • [Image] DocElementID:(topleftx,toplefty)-(bottomrightx,bottomrighty) • [Video] keyframeID:(topleftx,toplefty)-(bottomrightx,bottomrighty) 18. EvtRelRelevance -- judgment of event/relation KE's relevance to scenario topic (FullyRelevant, PartiallyRelevant, or NotRelevant) 19. EdgeCoherence -- judgment of edge's semantic coherence (True or False) 20. EvtRelCoherence -- judgment of event/relation KE's semantic coherence (True or False) 21. CoverageOfBestMatchingPT -- judgment of hypothesis's overall coverage of best matching prevailing theory indicated in column 22, if any (FullyCovered, PartiallyCovered, or None) 22. BestMatchingPrevailingTheory -- hypothesis's best matching prevailing theory, if any (e.g. E102Theory5) 23. PrevailingTheoryMatchingArgID -- edge's best matching prevailing theory argument or arguments, if any. Formatted like e.g. E101_Theory3-KE002-evt090arg02victim-80000117 with multiple matching arguments separated by '|' 8.0 Software Tools Included in this Release 8.1 Ltf2txt A data file in ltf.xml format (as described above) can be conditioned to recreate exactly the "raw source data" text stream (the rsd.txt file) from which the LTF was created. The tools described here can be used to apply that conditioning, either to a directory or to a zip archive file containing ltf.xml data. In either case, the scripts validate each output rsd.txt stream by comparing its MD5 checksum against the reference MD5 checksum of the original rsd.txt file from which the LTF was created. (This reference checksum is stored as an attribute of the "DOC" element in the ltf.xml structure; there is also an attribute that stores the character count of the original rsd.txt file.) The tools are located here: tools/ltf2txt Each script contains user documentation as part of the script content; you can run "perldoc" to view the documentation as a typical unix man page, or you can simply view the script content directly by whatever means to read the documentation. Also, running either script without any command-line arguments will cause it to display a one-line synopsis of its usage, and then exit. ltf2rsd.perl -- convert ltf.xml files to rsd.txt (raw-source-data) ltf2ma.perl -- convert ltf.xml files to ma_tkn.txt (morpheme-segmented text) ltfzip2rsd.perl -- extract and convert ltf.xml files from zip archives 8.2 Twitter-processing The executable get_tweet_by_id.rb is located under tools/twitter-processing/bin/ and can be used to download and condition twitter text to match the version used by LDC for annotation. See tools/twitter-processing/README.md for further information. 9.0 Documentation Included in this Release The ./docs folder (relative to the root directory of this release) contains a set of tab-delimited table files, pdf files, and excel files. They are organized into annotation and assessment subdirectories, each of which is further divided into Month 9 Pilot Evaluation and Phase 1 Evaluation subdirectories. Each file is described in a subsection below. In the following, the term "asset" refers to any single "primary" data file of any given type. Each asset has a distinct 9-character identifier. If two or more files appear with the same 9-character file-ID, this means that they represent different forms or derivations created from the same, single primary data file (e.g. this is how we mark corresponding LTF.xml and PSM.xml file pairs). Data scouting, annotation, and related metadata are all managed with regard to a set of "root" HTML pages (harvested by the LDC for a specified set of topics); therefore the tables and annotations make reference to the asset-IDs assigned to those root pages. However, the present release does not include the original HTML text streams, or any derived form of data corresponding to the full HTML content. As a result, the "root" asset-IDs cited in tables and annotations are not to be found among the inventory of data files presented in zip archives in the "./data" directory. Each root asset is associated with one or more "child" assets (including images, media files, style sheets, text data presented as ltf.xml, etc.); each child asset gets its own distinct 9-character ID. The root-child relations are provided in "parent_children.tab" table (9.1.1), and as part of the LDCC header content in the various "wrapped" data file formats (as listed in section 2). 9.1 Top-level Documentation Files in the top-level /docs directory describe data relevant to both annotation and assessment partitions. 9.1.1 "parent_children.tab" -- Relation of Child Assets to Root HTML Pages This file is located in the top-level docs directory here: docs/parent_children.tab Each data file-ID in the set of zip archives is represented by the combination of child_uid and child_asset_type (columns 2 and 4), along with its root UID in column 1. Col.# Content 1. parent_uid -- 9-character source document ID string 2. child_uid -- 9-character ID string for media element of source document 3. url -- URL for root document or for child asset 4. child_asset_type -- media type, represented as file type and storage format (e.g., .ltf.xml, .jpg.ldcc) 5. topic -- topic ID for which the document was annotated 6. lang_id -- automatically detected language, "n/a" for non-ltf assets 7. lang_manual -- manually selected language(s) 8. rel_pos -- position of this asset relative to other child assets on the page 9. wrapped_md5 -- md5 checksum of .ldcc formatted asset file 10. unwrapped_md5 -- md5 checksum of original asset data file 11. download_date -- download date of asset 12. content_date -- creation date of asset, or n/a 13. status_in_corpus -- "present" or "diy" - set to "diy" for assets associated with tweets Notes: - Because ltf and psm files have the same "child" uid and differ only in the file extension (.ltf.xml or .psm.xml), only the ltf files are listed in the parent_children.tab document. - The URL provided for each .ltf.xml entry in the table is the "full-page" URL for root document associated with the "parent_uid" value. (For other types of child data -- images and media -- the "url" field contains the specific url for that specific piece of content.) - Some child_uids (for images or videos) appear multiple times in the table, because they were found to occur identically in multiple root web pages. - "Derived assets" such as ltf and psm do not have a relative position value. - Topic and manually selected language data were withheld from previous versions of parent_children.tab to protect evaluation-sensitive information. That data is now provided since there is no longer a need to protect evaluation-sensitive information. 9.1.2 "masterShotBoundary.msb" -- Summary of Shot Boundary Segments This file is located in the top-level docs directory here: docs/masterShotBoundary.msb For each video included in the release, a set of segments was generated with the video shot boundary detector and is listed in this file. Col.# Content 1. keyframe_id -- Unique id constructed using the 9-character file-ID of the video from which the frame was extracted and a unique id for the keyframe (e.g., HC0000SPD_26, which was extracted from HC0000SPD) 2. start_frame -- Shot start frame 3. end_frame -- Shot end frame 4. start_time -- Shot start time in seconds 5. end_time -- Shot end time in seconds 9.2 Annotation Documentation 9.2.1 "twitter_info.tab" -- Summary of Twitter Assets This file is located in the annotation subdirectory of the docs directory here: docs/annotation/twitter_info.tab For each tweet collected, a row listing asset uid, tweet ID, user ID, and topic UID is included: Col.# Content 1. uid 2. tweet_id -- Twitter-provided tweet ID 3. user_id -- Twitter-provided user ID This file can be used with the twitter-processing utility provided in the tools/ directory of this package to ensure that the downloaded tweet contents match those retrieved by LDC so that any annotations can be correctly aligned with the tweet. 9.2.2 Month 9 Annotation Documentation The following documents are present in the docs/annotation/month_9_pilot_evaluation directory of this package. 9.2.2.1 AIDA_Seedling_Annotation_Guidelines_V2.1.pdf Version of annotation guidelines that were used to produce the Month 9 annotations in this package. 9.2.2.2 AIDA_Seedling_Ontology_Info_V7.xlsx File with information on the Month 9 ontology types and constraints used in annotation. 9.2.2.3 eval_hypothesis_info.tab Four-column table providing information about each hypothesis. The four columns in this table are as follows: Col.# Content 1. hypothesis_id -- Unique identifier for each hypothesis; the id consists of 3 fields separated by underscores: topic_query_hypothesis: P101_Q001_H001 is the first hypothesis for topic P101, Query 1, and P101_Q002_H001 is the first hypothesis for topic P101, Query 2, etc. 2. topic_name -- Name for the topic the hypothesis is relevant to 3. query -- Natural language query the hypothesis is in response to 4. hypothesis -- Natural language hypothesis text 9.2.2.4 eval_topic_description.pdf Description of each pilot eval topic and the conflicting information types that were expected. Note that the conflicting information contained in the topic description is not intended to be exhaustive or to constrain the annotation in any way. This topic description, in combination with the eval_hypothesis_info.tab file in the same subdirectory and the mini KBs for each topic in the data/ directory, constitute the "topic model". 9.2.2.5 eval_table_field_descriptions.tab Description of the structure of each type of annotation table in the data/annotation/month_9_pilot_evaluation/{P101,P102,P103} subdirectories. This table includes information about column headers, content of each field, and format of the contents. 9.2.3 Scenario 1 Annotation Documentation The following documents are present in the docs/annotation/phase_1_evaluation directory of this package. 9.2.3.1 AIDA_Annotation_Guidelines_Quality_Control_and_Informative_Mentions_V1.0.pdf Versions of annotation guidelines that were used to perform Quality Control of the Scenario 1 Salient Mentions annotations in this package. 9.2.3.2 AIDA_Annotation_Guidelines_Salient_Mentions_V1.0.pdf Versions of annotation guidelines that were used to produce the Scenario 1 evaluation annotations in this package. 9.2.3.3 AIDA_phase_1_table_field_descriptions_v4.tab Description of the structure of each type of annotation table. This table includes information about column headers, content of each field, and format of the contents. 9.2.3.4 LDC_AIDAAnnotationOntology_V8.xlsx The Scenario 1 annotation ontology. 9.2.3.5 E101_E102_E103_topic_description.pdf Descriptions of E101, E102, and E103 topics with queries and query IDs. Note that the queries are meant to draw annotators' attention to expected points of informational conflict within the topic, but salience to the topic is defined more broadly than simply providing the answer to one of the queries. See the annotation guidelines for instructions provided to annotators on determining salience. 9.2.3.6 {E101,E102,E103}_prevailing_theories_final.xlsx These three files contain prevailing theories for topics E101, E102, and E103 respectively. 9.3 Assessment Documentation 9.3.1 Month 9 Assessment Documentation The following documents are present in the docs/assessment/month_9_pilot_evaluation directory of this package. 9.3.1.1 AIDA_2018_Assessment_Guidelines_V1.0.pdf Latest version of the guidelines that were used to produce Month 9 Evaluation assessments. 9.3.1.2 zerohop_queries.xml This file contains 269 zero-hop query entry points. Each entry point contains an entity mention, its source document, and a KB node. In total, there are 20 unique KB nodes across 269 entry points, corresponding to 20 unique entities in the AIDA P103 mini-KB. All zero-hop responses assessed by LDC annotators were responses to one of these 20 entities. 9.3.2 Scenario 1 Assessment Documentation The following documents are present in the docs/assessment/phase_1_evaluation directory of this package. 9.3.2.1 AIDA_2019_Entity_Assessment_Guidelines_V1.1.pdf Latest version of the guidelines that were used to produce the assessments of class and zero-hop assessments during Scenario 1 Assessment. 9.3.2.2 AIDA_2019_Event_Relation_Assessment_Guidelines_V1.0.pdf Latest version of the guidelines that were used to produce the assessments of graph responses during Scenario 1 Assessment. 9.3.2.3 AIDA_2019_Hypothesis_Assessment_Guidelines_V1.1.pdf Latest version of the guidelines that were used to produce the assessments of TA3 hypotheses during Scenario 1 Assessment. 10.0 Known Issues 10.1 Month 9 Annotations All text entity mentions should have a mention level (nam/nom/pro). However, there are some text entity mentions with missing mention levels. Relations should have exactly two slots annotated. However, there are several cases of relation mentions with only one slot annotated. There is also a case where a relation has two slots annotated, but one slot is missing a slot_type. Each hypothesis should have exactly one judgement for each relation and event mention. Some hypotheses were not judged for all relation/event mentions. One instance in which this can occur is when a relation or event has only fillers as arguments. In some cases, start and end dates are not in standard format (YYYY-MM-DD). Each type.subtype combination for events and relations has a specified set of allowable types for the entities/fillers that can occupy its slots. In some cases, an entity with an unexpected type is included as an argument. Each entity or event that is an argument of a relation mention or event mention should share provenance with that relation or event mention. That is, given the provenance UID for a relation or event mention, and given an entity (or event) that is an argument of that relation or event mention: at least one of the provenance UIDs associated with the mentions of that argument should be the same as the provenance UID for the relation or event mention in question. However, there are many cases of arguments that do not have a shared provenance with their corresponding event or relation. 10.2 Scenario 1 Evaluation Annotations Duplicate arg mentions -- Some arg mentions may be annotated more than once when they appear as arguments of more than one relation/event; that is, the same type, subtype, and sub-subtype may be applied to the same text extent (or video/image provenance) more than once. Note that duplicate arg mentions each have a unique argmention_id. Missing mediamention_coordinates -- All mentions tagged in non-text assets are expected to have mediamention_coordinates indicating where in the asset the mention occurs. There are 876 entity mentions, 199 event mentions, and 142 relation mentions that have "EMPTY_TBD" or "EMPTY_NA" under mediamention_coordinates despite being tagged in non-text assets. Orphaned base arg mentions -- There are many base arg mentions that are not annotated as a slot in an event or relation. Keyframe documentation missing -- Shot start frame, shot end frame, shot start time, and shot end time are missing from the document masterShotBoundary.msb for the keyframes IC0019NAV_77 and IC001L2RD_104. 10.3 Empty cells in some Month 9 Annotation and Scenario 1 Assessment files Some of the Month 9 Annotation files and Scenario 1 Evaluation Assessment files contain empty cells. These appear as a sequence of two tab characters within a given line in a tab file (e.g. /\t\t/), or as a line-final tab character (e.g. /\t$/). There are no line-initial empty cells (e.g. /^\t/). Note that lines can contain multiple empty cells, and even multiple contiguous empty cells. Care should be taken when processing these files that empty cells are handled appropriately, especially that data from other fields is not shifted into the empty cells. The files in the package that contain empty cells are: ./data/annotation/month_9_pilot_evaluation/P101/P101_ent_mentions.tab ./data/annotation/month_9_pilot_evaluation/P101/P101_evt_mentions.tab ./data/annotation/month_9_pilot_evaluation/P101/P101_evt_slots.tab ./data/annotation/month_9_pilot_evaluation/P101/P101_rel_mentions.tab ./data/annotation/month_9_pilot_evaluation/P102/P102_ent_mentions.tab ./data/annotation/month_9_pilot_evaluation/P102/P102_evt_mentions.tab ./data/annotation/month_9_pilot_evaluation/P102/P102_evt_slots.tab ./data/annotation/month_9_pilot_evaluation/P102/P102_rel_mentions.tab ./data/annotation/month_9_pilot_evaluation/P102/P102_rel_slots.tab ./data/annotation/month_9_pilot_evaluation/P103/P103_ent_mentions.tab ./data/annotation/month_9_pilot_evaluation/P103/P103_evt_mentions.tab ./data/annotation/month_9_pilot_evaluation/P103/P103_evt_slots.tab ./data/annotation/month_9_pilot_evaluation/P103/P103_rel_mentions.tab ./data/assessment/phase_1_evaluation/graph/AIDA_TA1_graph_2019.tab ./data/assessment/phase_1_evaluation/zero-hop/AIDA_TA1_ZH_2019.tab 11.0 Copyright Portions © 2003, 2015 2000.ua, © 2015 Arguments and Facts, © 2014 Associated Newspapers Ltd, © 2017 Belarus Today, © 2017 Belarusian Hour, © 2016-2018 Bessarabia INFORM, © 2017-2018 Bird In Flight, © 2015-2016 Cable News Network. Turner Broadcasting System, Inc., © 2016 Censor.NET, © 2011-2015, 2017 Consortiumnews, © 2014-2016 Digital Venture LLC, © 2012, 2017 DirectPress.ru, © 2015 Elisa Group Ltd., © 2018 Elnews.ru, © 2014 EUROMAIDAN PRESS, © 2015-2016 euronews, © 2017 Facts and Comments, © 2016 FAN, © 2014 Forbes Media LLC, © 2014 From-UA, © 2013 gate @ Crimea – news, comments, © 2017 Gazetadaily.ru, © 2018 GLAVRED.INFO, © 2011 Human Rights Watch,© 2013, 2017-2018 IA REGNUM, © 2014-2015 InfoKava.com, © 2017 Information and Analytical Agency, © 2015 InoSMI.ru,© 2014 Interfax-Ukraine, © 2009-2017 JSC Business News Media, © 2012-2014, 2016 KM Online, LLC, © 2014-2015 Lenta.Ru LLC, © 2017 Liga Information and Analytical Center, © 2015, 2017 Lux Television and Radio Company, © 2014-2017 MIA Russia Today, © 2016-2018 mirnews.su, © 2014 Mirror of the week, © 2017 News Front, © 2015-2016 NEWSru.com, © 2018 Obozrevatel, © 2014-2015 PJSC Today Multimedia, © 2017 Public Television, © 2014-2015, 2017 Radio Liberty, © 2014-2015 RFE/RL, © 2014-2015 The Daily Beast Company LLC, © 2014-2017 The Military Review, © 2011-2012 The Power of Truth, © 2014 The Slate Group, © 2014, 2016-2017 TSN.ua, © 2014-2017 TV-Novosti, © 2015, 2017 Ukrainian Media Holding, © 2014, 2016 Ukrainian Media Systems, © 2014-2015, 2017 Ukrainian Pravda, © 2015-2017 Ukrinform, © 2014, 2017 UNIAN.NET, © 2014-2015 Vice News, © 2017 Western Information Corporation, © 2014-2018 Zhitomir-Online, © 2018 Trustees of the University of Pennsylvania 12.0 Contacts Dana Delgado - AIDA Project Manager Christopher Caruso - AIDA Tech Lead Ann Bies - AIDA Coordinator Kira Griffitt - AIDA Coordinator ------ README created by Chris Caruso on February 3, 2023 updated by Jeremy Getman on February 9, 2023 updated by Stephanie Strassel September 5, 2023 updated by Jeremy Getman on May 15, 2024 updated by Summer Ploegman on May 14, 2025 updated by Kira Griffitt on May 27, 2025