Home › Language Resources › Data

AIDA Scenario 2 Practice Topic Source Data

Item Name:	AIDA Scenario 2 Practice Topic Source Data
Author(s):	Jennifer Tracey, Stephanie Strassel, Jeremy Getman, Ann Bies, Kira Griffitt, David Graff, Christopher Caruso
LDC Catalog No.:	LDC2024T04
ISLRN:	484-106-854-383-0
DOI:	https://doi.org/10.35111/0hze-0459
Release Date:	April 15, 2024
Member Year(s):	2024
DCMI Type(s):	MovingImage, Software, Sound, StillImage, Text
Sample Type:	mpeg
Sample Rate:	44100 Hz
Data Source(s):	discussion forum, newswire, web collection, weblogs
Project(s):	AIDA
Application(s):	entity extraction, information extraction
Language(s):	English, Spanish, Russian
Language ID(s):	eng, spa, rus
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2024T04 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Tracey, Jennifer, et al. AIDA Scenario 2 Practice Topic Source Data LDC2024T04. Web Download. Philadelphia: Linguistic Data Consortium, 2024.
Related Works: Hide	View isAnnotationOf LDC2023T10 AIDA Scenario 1 and 2 Reference Knowledge Base hasAnnotation LDC2024T06 AIDA Scenario 2 Practice Topic Annotation isSimilarWith LDC2023T11 AIDA Scenario 1 Practice Topic Source Data LDC2025T02 AIDA Scenario 3 Practice Topic Source Data and Annotation LDC2025T13 AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment

Introduction

AIDA Scenario 2 Practice Topic Source Data was developed by the Linguistic Data Consortium (LDC) and is comprised of 1500 root documents, including text, image, and video, from English, Russian, and Spanish web sources.

The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating and annotating multimodal linguistic resources in multiple languages.

Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice subtopics or evaluation subtopics. The Phase 2 scenario focused on the socioeconomic and political crisis in Venezuela since 2010. This corpus constitutes the full set of topic-focused documents for Phase 2 practice subtopics.

Data

Data was collected from web sources by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page).

The knowledge base for entity detection and linking annotation for all AIDA Scenario 1 and 2 corpora is available separately as AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10).

Sponsorship

This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013.

Samples

Please view the following samples:

Updates

None at this time.

Copyright

Portions © 2015 21st Century Wire, © 2020 ABC, © 2013 ABC News Internet Ventures, © 2014, 2017-2018 Alba Ciudad 96.3 FM, © 2017 AL DÍA NEWS Media, © 2017-2018 Al Jazeera Media Network, © 2018 AméricaEconomía, © 2019 American Association for the Advancement of Science, © 2019 Americas Society/Council of the Americas, © 2020 AMX Content SA de CV, © 2014, 2017 Arguments and Facts JSC, © 2014 ARMENPRESS, © 2018 Authorized by the Chief Agent, CPC, © 2014, 2017-2018 Autonomous Nonprofit Organization “TV-Novosti”, © 2013-2014, 2018-2019 BBC, © 2015, 2017-2018 Bellingcat, © 2019 Breitbart, © 2018 Business capital, © 2020 business/media bureau ekonomika,© 2019-2020 C.A. IBERONEWS LIMITED, © 2018-2020 C.A. The Universe, © 2013, 2017 Cable News Network. Turner Broadcasting System, Inc., © 2017 Caracas Chronicles, © 2018 Caracol SA, © 2018 CARACOL TELEVISIÓN SA, © 2013, 2017 CBC/Radio-Canada, © 2013 CBS Interactive Inc., © 2020 CDN, © 2017 Center for Democracy in the Americas, © 2014-2015 Channel One, © 2017 Chicago Tribune, © 2020 China Daily Information Co, © 2014 CJSC Editorial office of the newspaper Moskovsky Komsomolets, © 2014 CNBC LLC, © 2020 COHA, © 2014 Colombia Reports, © 2015, 2012 Comments, © 2018 COMUNICAN SA, © 2018 Condé Nast, © 2019-2020 CounterPunch, © 2020 Crisis Group, © 2019 Dailymotion, © 2020 Daily News of Vladivostok, © 2018 DiarioContraste.com, © 2017 Diariocorreo.pe, © 2014 Diario La Voz, © 2018, 2020 Diario las Americas, © 2018 Dicasterium pro Communicatione © 2019 Dixi Media Digital, SL, © 2014 DolarToday.com, © 2014, 2017 Dow Jones & Company, Inc., © 2020 EADaily, © 2014, 2017-2018 EDICIONES EL PAÍS SL, © 2018 Ediciones Prensa Libre SL, © 2019 Editions CDR, © 2020 Editorial Ecoprensa, S.A., © 2017-2018 Editorial Office of Rossiyskaya Gazeta, © 2018 Editorial Prensa Alicantina SAU, © 2018 Efecto Cocuyo CA, © 2020 EL COLOMBIANO S.A.S, © 2014 Elcomercio.pe, © 2018, 2020 EL HERALDO S.A., © 2019 El Impulso, © 2018 El Nuevo Herald, © 2019 EL PERIÓDICO DE CATALUNYA, SLU, © 2019-2020 el Popular, © 2020 EL TERRITORIO, © 2017 EL TIEMPO Casa Editorial, © 2017 El Tiempo Latino, © 2020 elucabista, © 2018-2019 El Universal, © 2020 Encyclopedia Britannica, Inc., © 2019 Entravision, © 2019 Epoch Times Russia, © 2019 euronews, © 2018-2019 Europa Press, © 2018 Euroradio, © 2020 Excelsior, © 2014 FAN, © 2018 First News Media, © 2014 Forbes.com LLC, © 2018 France 24, © 2017 Future Publishing Limited, Quay House, The Ambury, Bath BA1 1UA, © 2020 GardaWorld, © 2020 GlobalResearch.ca, © 2020 G/O Media Inc., © 2014-2015, 2017 Golden Middle LLC, © 2018-2019 Google LLC, © 2014 GORDON, © 2014 Graham Digital Holding Company, © 2018 Grupo La República Publicaciones SA, © 2014 Guardian News and Media Limited or its affiliated companies, © 2014 Haaretz Daily Newspaper Ltd., © 2020 Havana Times, © 2019 HindustanTimes, © 2018, 2020 HispanTV, © 2020 Houston Public Media, A Service of the University of Houston, © 2020 HSB Group, © 2018 ID "Interlocutor", © 2014 Image and Communication, © 2018 Impremedia Operating Company LLC, © 2017 Independent.co.uk, © 2018-2020 Infobae, © 2017 Information agency "Ukrainian National News", © 2014 Informe21.com, © 2018 Innova and Comunica Media SL, © 2014 InoSMI.ru, © 2017 Interfax-Ukraine, © 2017 iPress.ua, © 2020 IT Plus, © 2018 Izvestia MIC, © 2017 Journal Media Ltd., © 2018 Journalistic Society El Ciudadano Ltda, © 2015-2018 JSC Business News Media, © 2014-2015, 2017 JSC Kommersant, © 2014, 2017-2018 JSC Gazeta.Ru, © 2018 JSC NTV Television Company, © 2019 JSC TRK Armed Forces “ZVEZDA ", © 2013, 2017 JSC TV and Radio Company Petersburg, © 2014-2015, 2017 Korrespondent.net, © 2014-2019 Latin Post, © 2017 LLC Business Newspaper "Vzglyad", © 2017 LLC RTVIA Production, © 2014 Los Angeles Times, © 2018 Media Corporation of Extremadura SA, © 2015-2016 Meduza, © 2015-2016 MIA Russia Today, © 2018 Miami Herald, ©2018 Miami New Times, LLC, © 2018 Microsoft, © 2018 MintPress News, © 2019 Natural News Network, © 2015, 2017 NBC Universal, © 2017 News24Today, © 2019 NEWS.am, © 2018 NEWSONE.UA, © 2018-2019 Newspaper First Edition, © 2017 News up to date, © 2016 Newsweek Digital LLC, © 2018 Nextstar Media Inc., © 2018-2020 Nezavisimaya Gazeta, © 2014 Nine Digital Network, © 2020 NOTICIAS AL DIA Y A LA HORA, © 2019 Novaya Gazeta, © 2017-2018 npr, © 2020 OAS, © 2020 Orlando Sentinel, © 2020 Our newspaper,© 2018 PJmedia.com/Salem Media, © 2018 Polit.ru, © 2017 PolitRussia, © 2013-2019 Pravda.Ru LLC, © 2015-2016 Present Time, © 2013, 2018 Publishing House JSC, © 2019 Radio Havana Cuba, © 2020 Radio Televisión Martí, © 2018 Radio Vesti, © 2018 Relrus.ru, © 2014-2015, 2017-2018 Reuters, © 2020 RFE/RL, © 2020 RFI, © 2017, 2019 Russian information and analytical agency "SM News", © 2020 Russian International Affairs Council, © 2018 Rutube, © 2014, 2017 ROSBUSINESSCONSULTING JSC, © 2020 SA THE NATION, © 2018 SA Week Publications, © 2019 SIA "TVNET GRUPA", © 2018 SIA "TV Rain", © 2017 Sky UK, © 2020 South Mail Newspaper, © 2020 Spanish Radio and Television Corporation, © 2013, 2017, 2019 Sputnik, © 2014 SVIT24.NET, © 2017, 2019 TASS, Russian news agency, © 2017 Telegraph Media Group Limited, © 2018 Television and Radio Company Lux, TV Channel 24, © 2014, 2016 Television news service, © 2018 The American Conservative, a publication of The American Ideas Institute, © 2017 The Associated Press, © 2013, 2017 The Atlantic Monthly Group, © 2014 The Christian Science Monitor, © 2019 The Cooperator, © 2017-2018 The Daily Beast Company LLC, © 2020 The Daily Left, © 2019 The Dallas Morning News, © 2014 The Economist Intelligence Unit Limited, © 2015, 2017 THE IBEROSPHERE GAZETTE, © 2015 The Irish Times, © 2020 The Jordan News, © 2020 The Lion of El Español Publications SA, © 2020 The New Journal, © 2018 The New Republic, © 2014-2015 The New York Times Company, © 2020 The Press, © 2019 The Region Newspaper, © 2018-2019 The San Diego Union-Tribune, © 2019 The Star of Panama, © 2017-2018 The Stimulus, © 2018 The Venezuelan News, © 2013, 2017 The Washington Post, © 2020 The Washington Times, LLC, © 2014, 2017 The World from PRX, © 2018 ThinkProgress, © 2017 TIME USA, LLC, © 2018 Titania Editorial Company SL, © 2020 Tritón Comunicaciones S.A de C.V., © 2018, 2020 TRT World, © 2020 Turkuvaz Haberleşme ve Yayıncılık, © 2017 TV Center JSC, © 2017 UA.NEWS, © 2014 uapress, © 2013-2019 UDF.BY, © 2013-2019 Ukrinform, © 2014-2017 UNIAN.NET, © 2014, 2018 Unidad Editorial Informacion General, © 2019 United Press International, Inc., © 2017-2018 Univision Communications Inc., © 2017 USA TODAY, a division of Gannett Satellite Information Network, LLC, © 2017 Verizon Media, © 2014-2015, 2017 Vesti.Ru online edition, © 2019 VK LLC, © 2014, 2018 Vox Media, LLC, © 2020 Workers World, © 2013 World and Politics, © 2014 worldnewsage.com, © 2017 www.charter97.org, © 2015 XINHUANET.com, © 2018 Yahoo, © 2018-2020, 2023, 2024 Trustees of the University of Pennsylvania