Annotation Included in this Release 1.0 Annotation Overview Gold standard reference annotations for each CE were created by trained annotators who labeled the scenario-relevant events and relations in each CE’s document set using the pre-defined KAIROS program ontology, resulting in a structured representation of the temporally-ordered events, relations and arguments necessary to fully express the scenario-relevant events in each CE. 2.0 Source Data Document Sets The Phase 1 evaluation focused on nine real-world incidents or Complex Events (CEs) in the Improvised Explosive Device (IED) bombing scenario and one incident in the mass shooting scenario. The Phase 1 evaluation CE names and IDs are as follows: - ce1005: Sidney Aeroplane Bomb Plot, Australia, 2017 - ce1006: Stockholm Bombings, Sweden, 2010 - ce1007: Manchester Arena Bombing, England, 2017 - ce1008: Taxi Detonation, Canada, 2016 - ce1009: Spokane Bombing Attempt, Washington, 2011 - ce1010: Derry Bombing, Northern Ireland, 2019 - ce1011: Bogotá Police Academy Car Bombing, Colombia, January 2019 - ce1012: Kansas City Hospital Bombing, Missouri, 2020 - ce1013: Attempted bombing in Moses Lake, Washington, 2018 - ce1020: El Paso Walmart Shooting, Texas, 2019 - ce1021: Orlando nightclub shooting, Florida, 2016 Each CE has a separate source data set in this corpus, where on-topic documents in the data set are relevant to the CE and suitable for both human annotation and system evaluation, and off-topic documents in the data set are not relevant to the CE, but cover events that occurred at a similar time or place as the CE. Off-topic documents were included in each CE data set as part of the Phase 1 evaluation as distractor documents. The data sets are multilingual (English and Spanish) and multimedia (text, video, image). Source data format and usage are described in ./docs/README.txt section 3. 3.0 Annotation Approach The goal of annotation for the evaluation data was to develop a temporally ordered list of the unique events and relations (plus their arguments) necessary to fully account for the scenario-relevant information conveyed about the CE in the data, to serve as ground truth for system evaluation. The data set of source documents for each CE was manually labeled for the scenario-relevant events the data set contains, resulting in a structured representation of the temporally-ordered events, relations and arguments necessary to fully express the CE. KAIROS evaluation utilized the concept of Event Primitives, which defined the minimum level of event granularity that KAIROS was concerned with. Event Primitives (EPs) can be simply understood as manually labeled or system-extracted events (or relations) within the KAIROS program. To support holistic CE understanding across a multilingual and multimedia data set, we did not label every mention of every individual event, relation and entity discussed in the document set for each CE; instead, we created exactly one annotation frame for each unique event or relation present in the data. This annotation of all the relevant information for a given event or relation is not a cluster of individual event/relation mentions. Instead, it is an abstract structured representation of the event or relation as a whole, without any individual mentions or links to specific spans of document provenance. Timestamp temporal information is included as part of the annotation for each event. Inference and logical reasoning are permitted for the participation of arguments in events and for temporal information, but events, relations and entities must be explicitly mentioned in the data to be labeled in the reference annotation. Events and relations (including their arguments) that comprise each CE are annotated across the documents in the CE’s data set as a whole, and thus do not include document provenance such as text offsets or video timestamps. Events and relations that are mentioned in multiple documents or in multiple media types (e.g., text and video) are listed only once, with their arguments and attributes reflecting the comprehensive information available in the data set as a whole. Temporal ordering of all labeled events within each CE is provided by specifying the start order for each event, relative to other events for the CE. The event start may be specified as "exactly," "before" or "after" some numbered order, or may be specified as unknown. "Before" and "after" may be combined. More than one event can have the same start order relative to other events. Argument entities with different annotation tags (for example, FAC, the label for facility and ORG, the label for organization) that refer to the same entity (for example, "University of Pennsylvania") are coreferenced, so arguments that refer to the same real-world entity have the same entity_id within each CE. Detailed annotation guidelines are included in ./docs/KAIROS_Phase1_Eval_AnnotationGuidelines_v1.11.pdf. The annotation tag set is included in ./docs/KAIROS_Annotation_Tagset_Phase_1_V3.0.xlsx. See ./docs/README_docs.txt section 5 and ./docs/annotation_tagset_description.pdf for a description of the tabs and fields in the tag set Excel file. For further information about annotation approaches in the KAIROS program, refer to Bies et al. (2024). 4.0 Annotation Results The table below summarizes the total number of documents and amount of annotation included for each CE in the corpus: total_doc_src - "root" web pages collected and processed total_evt - number of events annotated total_rel - number of relations annotated total_arg - number of arguments annotated ce_id | total_doc_src | total_evt | total_rel | total_arg | ce1005 | 14 | 77 | 49 | 444 | ce1006 | 13 | 82 | 50 | 409 | ce1007 | 12 | 67 | 57 | 370 | ce1008 | 11 | 49 | 36 | 267 | ce1009 | 12 | 64 | 56 | 335 | ce1010 | 13 | 37 | 33 | 194 | ce1011 | 12 | 70 | 71 | 417 | ce1012 | 11 | 72 | 36 | 334 | ce1013 | 12 | 53 | 57 | 364 | ce1020 | 14 | 68 | 66 | 419 | ce1021 | 15 | NA | NA | NA | 5.0 Annotation Tables The reference annotation itself can be found in the ./data/annotation/ce10xx directories. The annotation subdirectory for each CE ID contains the following four annotation tables for that CE ID: _events.tab – annotation table of Event Primitives _relations.tab – annotation tale of Relations _arguments.tab – annotation table of Arguments of Event Primitives and Relations _temporal.tab – annotation table of Temporal Ordering of Event Primitives Detailed descriptions of the column labels and fields for each annotation table can be found in ./docs/annotation_table_description.pdf. 6.0 References Ann Bies, Jennifer Tracey, Ann O'Brien, Song Chen, Stephanie Strassel. 2024. Spanless Event Annotation for Corpus-Wide Complex Event Understanding. LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. Turin, May 20-24.