Abstract Meaning Representation 2.0 - Four Translations
Item Name: | Abstract Meaning Representation 2.0 - Four Translations |
Author(s): | Marco Damonte, Shay Cohen |
LDC Catalog No.: | LDC2020T07 |
ISBN: | 1-58563-924-9 |
ISLRN: | 359-968-732-813-3 |
DOI: | https://doi.org/10.35111/fr89-3285 |
Release Date: | April 15, 2020 |
Member Year(s): | 2020 |
DCMI Type(s): | Text |
Data Source(s): | discussion forum, weblogs, newswire |
Project(s): | BOLT, DEFT |
Application(s): | machine translation |
Language(s): | Italian, Spanish, German, Mandarin Chinese |
Language ID(s): | ita, spa, deu, cmn |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2020T07 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Damonte, Marco, and Shay Cohen. Abstract Meaning Representation 2.0 - Four Translations LDC2020T07. Web Download. Philadelphia: Linguistic Data Consortium, 2020. |
Related Works: | View |
Introduction
Abstract Meaning Representation 2.0 - Four Translations was developed by researchers at the University of Edinburgh, School of Informatics and consists of Spanish, German, Italian and Chinese Mandarin translations of a subset of sentences from Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).
AMR Annotation Release 2.0 is a semantic treebank of over 39,000 English natural language sentences from broadcast conversations, newswire and web text. The translated data in this release was designed for use in cross-lingual parsing.
Data
This corpus contains translations of the test split sentences from LDC2017T10, a total of 5,484 sentences or 1,371 sentences per language. The source sentences were drawn from material collected by the Linguistic Data Consortium, specifically, discussion forum text from the DARPA BOLT and DARPA DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming, Wall Street Journal text, translated Xinhua news texts, various newswire texts from NIST OpenMT evaluations and weblog data from the DARPA GALE program.
All data are presented as UTF-8 encoded plain text.
Samples
Please view this Italian text sample (TXT).
Updates
None at this time.