Home › Language Resources › Data

Abstract Meaning Representation 2.0 - Four Translations

Item Name:	Abstract Meaning Representation 2.0 - Four Translations
Author(s):	Marco Damonte, Shay Cohen
LDC Catalog No.:	LDC2020T07
ISBN:	1-58563-924-9
ISLRN:	359-968-732-813-3
DOI:	https://doi.org/10.35111/fr89-3285
Release Date:	April 15, 2020
Member Year(s):	2020
DCMI Type(s):	Text
Data Source(s):	discussion forum, weblogs, newswire
Project(s):	BOLT, DEFT
Application(s):	machine translation
Language(s):	Italian, Spanish, German, Mandarin Chinese
Language ID(s):	ita, spa, deu, cmn
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2020T07 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Damonte, Marco, and Shay Cohen. Abstract Meaning Representation 2.0 - Four Translations LDC2020T07. Web Download. Philadelphia: Linguistic Data Consortium, 2020.
Related Works: Hide	View isPartOf LDC2025T10 Abstract Meaning Representation 2.0 - Machine Translations isPartWith LDC2017T10 Abstract Meaning Representation (AMR) Annotation Release 2.0 isOutcomeOf LDC2017T10 Abstract Meaning Representation (AMR) Annotation Release 2.0 isSimilarWith LDC2024T11 Abstract Meaning Representation 3.0 - Machine Translations

Introduction

Abstract Meaning Representation 2.0 - Four Translations was developed by researchers at the University of Edinburgh, School of Informatics and consists of Spanish, German, Italian and Chinese Mandarin translations of a subset of sentences from Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).

AMR Annotation Release 2.0 is a semantic treebank of over 39,000 English natural language sentences from broadcast conversations, newswire and web text. The translated data in this release was designed for use in cross-lingual parsing.

Data

This corpus contains translations of the test split sentences from LDC2017T10, a total of 5,484 sentences or 1,371 sentences per language. The source sentences were drawn from material collected by the Linguistic Data Consortium, specifically, discussion forum text from the DARPA BOLT and DARPA DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming, Wall Street Journal text, translated Xinhua news texts, various newswire texts from NIST OpenMT evaluations and weblog data from the DARPA GALE program.

All data are presented as UTF-8 encoded plain text.

Samples

Please view this Italian text sample (TXT).

Updates

None at this time.

Copyright

Portions © 2020 Shay Cohen, © 2020 Marco Damonte, © 2002-2005 Agence France Presse, © 2007 Al Ahram, © 2007 Al Hayat, © 2007 Al-Quds Al-Arabi, © 2007 An Nahar, © 2007 Assabah, © 2002-2008 The Associated Press, © 2003-2004, 2007-2008 Central News Agency (Taiwan), © 1997, 2004-2007 China Central TV, © 2007 China Military Online, © 2007 Chinanews.com, © 1987-1989 Dow Jones & Company, Inc., © 2007 Guangming Daily, © 1995, 2003, 2007-2008 Los Angeles Times-Washington Post News Service, Inc., © 2002, 2004-2005, 2007-2008 New York Times, © 1994-1998, 2001-2008 Xinhua News Agency, © 2014, 2017 Language Weaver, Inc., © 2014, 2017 University of Colorado, © 2014, 2017 University of Southern California, © 2003, 2005, 2006, 2007, 2009, 2011, 2013, 2014, 2017, 2020 Trustees of the University of Pennsylvania