KAIROS Phase 2 Quizlet

Item Name: KAIROS Phase 2 Quizlet
Author(s): Song Chen, Ann Bies, Christopher Caruso, Jennifer Tracey, Stephanie Strassel
LDC Catalog No.: LDC2025T15
ISLRN: 655-957-812-959-9
DOI: https://doi.org/10.35111/n6td-nn51
Release Date: October 15, 2025
Member Year(s): 2025
DCMI Type(s): Image, MovingImage, Software, Sound, StillImage, Text
Data Source(s): web collection
Project(s): KAIROS
Application(s): entity extraction, event detection, information extraction, knowledge representation
Language(s): Spanish, English
Language ID(s): spa, eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2025T15 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Chen, Song, et al. KAIROS Phase 2 Quizlet LDC2025T15. Web Download. Philadelphia: Linguistic Data Consortium, 2025.
Related Works: View

Introduction

KAIROS Phase 2 Quizlet was developed by the Linguistic Data Consortium (LDC). It contains English and Spanish text, video and image data and annotations used for pre-evaluation research and system development during Phase 2 of the DARPA KAIROS program.

KAIROS Quizlets were a series of narrowly defined tasks designed to explore specific evaluation objectives enabling KAIROS system developers to exercise individual system components on a small data set prior to the full program evaluation. This corpus contains the complete set of Quizlet data used in Phase 2 which focused on five real-world complex events (CEs) within the Disease Outbreak (DO) scenario:

  • CE2002: Clostridium perfringens, Chipotle restaurant, Ohio, 2018
  • CE2004: Salmonella, from peanut butter, originated from Georgia peanut factory 2008
  • CE2011: 2011 E. coli linked to contact with livestock at fair, North Carolina
  • CE2019: 2017 Botulism from nacho cheese sauce, California
  • CE2039: 1976 Philadelphia Legionnaires' disease outbreak

The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus.

Data

Five quizlets were developed in Phase 2 (Quizlets 5 - 9). In additon to the source documents, this release contains the contents of Quizlet 6 (source documents and manual annotation), Quizlet 7 (source documents, updated annotation and graph G), Quizlet 8 (source documents, updated annotation, and graph G), and Quizlet 9 (source documents, manual annotation, and graph G). Quizlet 5 (schema representation development) did not require data or annotation and is not included in this release.

Source data was collected from the web by LDC; 66 root web pages were collected and processed, yielding 65 text data files, 890 image files, and 10 video files. Annotation steps included labeling scenario-relevant events and relations for each document to develop a structured representation of temporally-ordered events, relations and arguments; generating a reference knowledge graph; and linking labeled entries to a knowledge base derived from a Wikidata-based ontology..

Source data is presented in various formats: .gif, .jpg,. ltf, .mp4, .png, .psm, and .svg. Annotations are presented as tab separated files (.tab) for temporal ordering, relations, events, and arguments.

Software tools are also included in this release. The tools recreate original source data from the processed XML material.

  • ltf2rsd.perl -- convert ltf.xml files to rsd.txt (raw-source-data)
  • ltfzip2rsd.perl -- extract and convert ltf.xml files from zip archives

Samples

Please view these samples:

Sponsorship

KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-19-S-0014.

Updates

No updates at this time.

Available Media

View Fees





Login for the applicable fee