AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment

Item Name: AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment
Author(s): Jennifer Tracey, Stephanie Strassel, Jeremy Getman, Ann Bies, Kira Griffitt, David Graff, Christopher Caruso
LDC Catalog No.: LDC2025T13
ISLRN: 620-348-369-491-1
DOI: https://doi.org/10.35111/n4ac-3012
Release Date: September 15, 2025
Member Year(s): 2025
DCMI Type(s): Image, MovingImage, StillImage, Text
Sample Type: mpeg
Sample Rate: 44100 Hz (with some variations)
Data Source(s): web collection
Project(s): AIDA
Application(s): entity extraction, event detection, information extraction
Language(s): Ukrainian, Russian, English
Language ID(s): ukr, rus, eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2025T13 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Tracey, Jennifer, et al. AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment LDC2025T13. Web Download. Philadelphia: Linguistic Data Consortium, 2025.
Related Works: View

Introduction

AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian, and Ukrainian web documents (text, video, image), annotations and assessments used in the AIDA Phase 1 pilot and final evaluations.

Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice topics or evaluation topics.The Phase 1 scenario focused on political relations between Russia and Ukraine in the 2010s. The documents, annotations and assessments contained in this corpus include coverage of the following events: Suspicious Deaths and Murders in Ukraine (January-April 2015); Odessa Tragedy (May 2, 2014); and Siege of Sloviansk and Battle of Kramatorsk (April-July 2014).

The AIDA (Active Interpretation of Disparate Alternatives) Program was designed to support development of technology to assist in cultivating and maintaining understanding of events when there are conflicting accounts of what happened (e.g., who did what to whom and/or where and when events occurred). AIDA systems must extract entities, events, and relations from individual multimedia documents, aggregate that information across documents and languages, and produce multiple knowledge graph hypotheses that characterize the conflicting accounts that are present in the data.

Data

The corpus contains a multi-media collection of 10,522 documents, annotations for 386 of those documents, and assessement results covering 77,965 responses in 1,525 of those documents.

Source material was collected from the web by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page).

Annotations were performed in three steps: (1) within-document labels for scenario-related entities, relations and events; (2) coreference annotation across documents by linking information elements to a knowledge base; and (3) indicatons of any relationship between labeled events/relations and hypotheses about the scenario.

In the assessment phase, LDC annotators reviewed and judged system response files to provide evaluation organizers with a means for scoring submissions. Assessment tasks included zero-hop assessment, class-based assessment, graph assessment and hypotheseis assessment.

Further information about annotation and assessment processes are contained in the documentation accompanying this release.

Annotations and assessments are presented as tab separated files.

The knowledge base for entity detection and linking annotation for all AIDA Scenario 1 and 2 corpora is available separately as AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10).

Sponsorship

This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013.

Samples

Please view the following samples:

Updates

No updates at this time.

Available Media

View Fees





Login for the applicable fee