Home › Language Resources › Data

AIDA Scenario 3 Practice Topic Source Data and Annotation

Item Name:	AIDA Scenario 3 Practice Topic Source Data and Annotation
Author(s):	Jennifer Tracey, Stephanie Strassel, Jeremy Getman, Ann Bies, Kira Griffitt, David Graff, Christopher Caruso
LDC Catalog No.:	LDC2025T02
ISLRN:	141-368-488-003-3
DOI:	https://doi.org/10.35111/a9kv-ct74
Release Date:	February 17, 2025
Member Year(s):	2025
DCMI Type(s):	MovingImage, Software, StillImage, Text
Data Source(s):	discussion forum, newswire, web collection, weblogs
Project(s):	AIDA
Application(s):	entity extraction, information extraction
Language(s):	English, Russian, Spanish
Language ID(s):	eng, rus, spa
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2025T02 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Tracey, Jennifer, et al. AIDA Scenario 3 Practice Topic Source Data and Annotation LDC2025T02. Web Download. Philadelphia: Linguistic Data Consortium, 2025.
Related Works: Hide	View isAnnotationOf LDC2023T10 AIDA Scenario 1 and 2 Reference Knowledge Base isSimilarWith LDC2023T11 AIDA Scenario 1 Practice Topic Source Data LDC2024T04 AIDA Scenario 2 Practice Topic Source Data LDC2024T02 AIDA Scenario 1 Practice Topic Annotation LDC2024T06 AIDA Scenario 2 Practice Topic Annotation LDC2025T13 AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment

Introduction

AIDA Scenario 3 Practice Topic Source Data and Annotation was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian and Spanish web documents (text, video, image) and annotations.

The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating and annotating multimodal linguistic resources in multiple languages.

Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice subtopics or evaluation subtopics. The Phase 3 scenario focused on the COVID-19 global pandemic. This corpus contains source documents and annotations for the Scenario 3 practice topics.

Data

Source documents were collected from the web by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page).

The corpus contains 1417 root documents; 279 documents were annotated. Annotations include:

Event, relation and entity annotation (64 documents)
Claim frame annotation: claims (true or not) relating to the COVID-19 pandemic (203 documents)
Practice topic query claim frames: example claim frames intended to be used by systems as queries to extract similar claims from additional documents (30 documents)

Claim frame annotations were produced by LDC; University of Colorado Boulder; Johns Hopkins University; Language Technologies Institute, Carnegie Mellon University; and Univeristy of Illinois Urbana-Champaign.

Annotations are presented as tab separated files.

Sponsorship

This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013.

Samples

Please view the following samples:

Updates

None at this time.

Copyright

Portions © 2020 20 Minutos Editora, SL, © 2021 24Hours, © 2021 64 parallel online, © 2020 102Neuve.com, © 2020 ABC News, © 2020 Adepa, © 2020 Advance Local Media LLC, © 2020 AFP, © 2021 AirTV Production LLC, © 2021 Al Jazeera Media Network, © 2021 Allen Media Broadcasting, © 2021 American Association for the Advancement of Science, © 2020 Amnesty International, © 2021 AMX Content SA de CV, © 2020 Atresmedia Corporación de Medios de Comunicación, SA, © 2020-2021 Autonomous Nonprofit Organization “TV-Novosti”, © 2020-2021 BBC, © 2021 BGR Media, LLC, © 2020 Bloomberg L.P., © 2021 BMJ Publishing Group Ltd, © 2020 BotaShqip, © 2020 Bulletin of the Atomic Scientists, © 2020 Cable News Network. A Warner Bros. Discovery Company., ©2021 CARACOL TELEVISIÓN SA, © 2021 CBS Interactive Inc., © 2020 Charter ’97 www.charter97.org, © 2021 China Digital Times, © 2020 CNBC LLC, © 2020 Coba Media LTD, © 2020-2021 Condé Nast, © 2021 Consumer Reports, Inc., © 2021 Daily Herald, © 2021 DIARIO AS, S.L., © 2020 DIARIO EL CORREO, S.A., © 2021 Diario Libre, © 2021 DIARIO NORTE, © 2021 Digital Alert, © 2020 EatingWell.com, © 2021 EDICIONES EL PAÍS, © 2021 Editorial Ecoprensa, S.A., © 2021 EDITORIAL UNIT INFORMACIÓN GENERAL, SLU, © 2020 Elcomercio.pe, © 2021 EL HERALDO SA, © 2021 El Independiente, © 2021 Elsevier Ltd., © 2020 euronews, © 2021 European Journalism Training Association, © 2020 FactCheck.org, © 2021 FAKEOFF, © 2021 Federal State Budgetary Institution "Editing Office of Rossiyskaya Gazeta", © 2020 First republican information and analytical portal “SakhaNews” (“News of Yakutia”), © 2021 FMNervion, SA, © 2020 Forbes Media LLC, © 2021 FOX News Network, LLC, © 2021 France 24, © 2021 Galvis Ramirez & Cia SA, © 2021 GlobalResearch.ca, © 2021 Global Times News Agency Co., Ltd., © 2021 GORDON, © 2021 Grupo La República Publicaciones SA, © 2020 Guardian News and Media Limited or its affiliated companies, © 2021 Healthline Media LLC, © 2021 Hearst Communications, Inc., © 2021 Hearst Magazine Media, Inc., ©2021 Hearst Television Inc. on behalf of KOCO-TV, © 2020 HindustanTimes, © 2021 iHeartMedia, Inc., © 2021 Imagen y Comunicación, © 2021 Independent.co.uk, © 2020 Information Agency "Znak", © 2021 Insider Inc., © 2021 InoSMI.ru, © 2020 Institut Pasteur, © 2020-2021 Interfax-Ukraine, © 2021 JSC Business News Media, © 2021 JSC Editorial office of the newspaper "Moskovsky Komsomolets" Electronic periodical "MK.ru", © 2021 JSC Kommersant, © 2020 Kenosha News, © 2021 KFF, © 2021 KQED Inc., © 2021 Kursk.com, © 2021 La Prensa, © 2021 LATINOAMÉRICA21.COM, © 2020 La Vanguardia Ediciones, SLU, © 2021 LA VOZ DE GALICIA SA, © 2020 Lead Stories LLC, © 2020 Lenta.Ru LLC, © 2021 LIVE24 LLC, © 2020-2021 Living Media India Limited, © 2020 LLC "BFM.RU", © 2021 LLC "Kurs", © 2020-2021 LLC "Network of city portals", © 2021 Los Angeles Times, © 2020 Martin’s Wellness, © 2021 Mayo Foundation for Medical Education and Research (MFMER), © 2021 Media Matters for America, © 2021 MediaNews Group, © 2021 Medical Xpress, © 2021 MedicoPlus, © 2021 MIA "DKNews", © 2021 Natural News Network, © 2021 NBC UNIVERSAL, © 2021 Network publication "Vesti.Ru", © 2020 Newsweek Digital LLC, © 2021 Nexstar Media Inc., © 2020 North-West Broadcasting LLC, TV-21 TV Company, Murmansk, © 2021 npr, © 2020 Observer Media Group, © 2021 OK!. A DIVISION OF EMPIRE MEDIA GROUP INC., © 2021 Omnia.com.mx, © 2020 Online publication "CentralAsia.news", © 2021 Online publication " Information Agency "RosBalt ", © 2021 People's Daily Online, © 2020 Poynter Institute, © 2021 Publicaciones Semana S.A., © 2021 Public Broadcasting Service (PBS), © 2021 Public Television, © 2021 Publishing House <Komsomolskaya Pravda> JSC, © 2021 Radio Free Asia, © 2021 Rambler, © 2021 RBA Revistas, S.L., © 2021 RealClearHoldings, LLC, © 2020 “REN TV Channel”, © 2020 Reuters, © 2021 RFE/RL, Inc., © 2020 Royal Pharmaceutical Society, © 2021 SA LA NACION, © 2021 SCIENTIFIC AMERICAN, A DIVISION OF SPRINGER NATURE AMERICA, INC., © 2020 SI “GazetaDaily.ru”, © 2021 Sierra Club, © 2021 Sinclair, Inc., © 2021 Snopes Media Group Inc., © 2020 Southern Baptist Convention, © 2021 Spanish Radio and Television Corporation, © 2021 SPH Media Limited, © 2020 Springer Nature Limited, © 2021 Sputnik, © 2021 Stars and Stripes, © 2020 STAT, © 2021 TASS News Agency, © 2021 Television news service, © 2020-2021 The Associated Press, © 2021 The Colorado Sun, © 2020 The Conversation US, © 2021 The Dallas Morning News, © 2020 The Indian Express [P] Ltd., © 2020 The News, © 2021 The New York Times Company, © 2021 The Northside Sun, © 2021 The Philadelphia Inquirer, LLC, © 2021 The Printers (Mysore) Private Limited, © 2021 The San Diego Union-Tribune, © 2020 The Sun, US, Inc, © 2021 The University of Texas MD Anderson Cancer Center, © 2021 The voice of the interior, © 2020 The Washington Post, © 2020 The Washington Times, LLC, © 2021 TIME USA, LLC, © 2020 Tododisca, © 2020 Toronto Star Newspapers Ltd., © 2020 TV Azteca, S.A.B. de C.V., © 2020 UKRAINIAN MEDIA HOUSE PUBLISHING LLC, © 2021 Ukrainian Truth, © 2021 UKRI, © 2020-2021 Ukrinform, © 2021 Univision Communications Inc., © 2020 USA TODAY, a division of Gannett Satellite Information Network, LLC, © 2020 Vera Files, © 2021 Vice Media Group, © 2021 vozpopuli.com, © 2021 WHYY, © 2021 WUSA-TV, © 2021 WWB Holdings, LLC, © 2021 XINHUANET.com, © 2021 ZDNET, A Red Ventures company, © 2020, 2021, 2025 Trustees of the University of Pennsylvania