KAIROS Phase 1 Evaluation Source Data, Annotation, and Assessment

Item Name: KAIROS Phase 1 Evaluation Source Data, Annotation, and Assessment
Author(s): Song Chen, Jennifer Tracey, Justin Mott, Ann Bies, Michael Arrigo, Christopher Caruso, David Graff, Stephanie Strassel
LDC Catalog No.: LDC2026T07
ISLRN: 558-102-578-740-1
DOI: https://doi.org/10.35111/rfam-2766
Release Date: June 15, 2026
Member Year(s): 2026
DCMI Type(s): Image, MovingImage, Software, Sound, StillImage, Text
Data Source(s): web collection
Project(s): KAIROS
Application(s): entity extraction, event detection, information extraction, knowledge representation
Language(s): Spanish, English
Language ID(s): spa, eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2026T07 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Chen, Song, et al. KAIROS Phase 1 Evaluation Source Data, Annotation, and Assessment LDC2026T07. Web Download. Philadelphia: Linguistic Data Consortium, 2026.
Related Works: View

Introduction

KAIROS Phase 1 Evaluation Source Data, Annotation, and Assessment was developed by the Linguistic Data Consortium (LDC). It contains the English and Spanish source data (text, video and images), manual annotations, reference knowledge graphs, the system output assessed during the evaluation, and human assessment results from the Phase 1 evaluation of the DARPA KAIROS Program.

The Phase 1 evaluation focused on the improvised explosive bombing scenario with nine complex events (CEs) and two surprise complex events in the mass shooting scenario:

  • ce1005: Sidney Aeroplane Bomb Plot, Australia, 2017
  • ce1006: Stockholm Bombings, Sweden, 2010
  • ce1007: Manchester Arena Bombing, England, 2017
  • ce1008: Taxi Detonation, Canada, 2016
  • ce1009: Spokane Bombing Attempt, Washington, 2011
  • ce1010: Derry Bombing, Northern Ireland, 2019
  • ce1011: Bogotá Police Academy Car Bombing, Colombia, January 2019
  • ce1012: Kansas City Hospital Bombing, Missouri, 2020
  • ce1013: Attempted bombing in Moses Lake, Washington, 2018
  • ce1020: El Paso Walmart Shooting, Texas, 2019
  • ce1021: Orlando nightclub shooting, Florida, 2016

The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus. Each KAIROS evaluation focused on a real-world scenario and several real-world complex events within that scenario, along with the possibility of surprise complex events in different but related scenarios.

Data

Source data was collected from the web by LDC. A total of 139 root web pages were collected and processed, yielding 131 text data files, 1176 image files, and 27 video files. The evaluation source data for each complex event was an input data set consisting of 10-15 documents that included multimodal English and Spanish event-relevant and off-topic distractor documents. Manual annotation and assessment of event-relevant documents for 10 complex events are included in this release.

Scenario-relevant events and relations were labeled for each document to develop a structured representation of temporally-ordered events, relations and arguments that expressed the scenario-relevant events in each complex event. A reference knowledge graph (Graph G) was developed for each event; systems were expected to match the Graph G with a given schema library. Assessment data includes human assessment judgments and the system output that was manually assessed for the end-to-end evaluation task.

Source data is presented in various formats: .gif, .jpg,. ltf, .mp4, .png, .psm, and .svg. Annotations are presented as tab separated files (.tab). Graph G data is presented in JSON format and in human-readable Excel (.xlsx) files. System output is presented in JSON format and as tab separated files. A software tool is also included in this release to recreate original source data from the processed XML material.

Samples

Please view these samples:

Sponsorship

KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-19-S-0014.

Updates

No updates at this time.

Available Media

View Fees





Login for the applicable fee