Abstract Meaning Representation 3.0 - Machine Translations
Item Name: | Abstract Meaning Representation 3.0 - Machine Translations |
Author(s): | Bram Vanroy |
LDC Catalog No.: | LDC2024T11 |
ISLRN: | 737-010-881-982-1 |
DOI: | https://doi.org/10.35111/b94n-1y25 |
Release Date: | December 16, 2024 |
Member Year(s): | 2024 |
DCMI Type(s): | Text |
Data Source(s): | broadcast conversation, discussion forum, newswire, web collection, weblogs |
Project(s): | ACE, BOLT, DEFT, GALE, LORELEI |
Application(s): | machine translation |
Language(s): | Dutch, Spanish, Irish |
Language ID(s): | nld, spa, gle |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2024T11 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Vanroy, Bram. Abstract Meaning Representation 3.0 - Machine Translations LDC2024T11. Web Download. Philadelphia: Linguistic Data Consortium, 2024. |
Related Works: | View |
Introduction
Abstract Meaning Representation 3.0 - Machine Translations was developed by the Center for Computational Linguistics at KU Leuven in the HORIZON2020 project SignON. It is an automatic translation of a subset of sentences from Abstract Meaning Representation (AMR) Annotation Release 3.0 (LDC2020T02) into Spanish, Irish Gaelic, and Dutch.
AMR 3.0 is a semantic treebank of over 59,255 English natural language sentences from broadcast conversations, newswire, weblogs, web discussion forums, fiction and web text.
Data
The source sentences were drawn from material collected by the Linguistic Data Consortium, specifically, discussion forum text from the DARPA BOLT and DARPA DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming, Wall Street Journal text, translated Xinhua news texts, various newswire texts from NIST OpenMT evaluations and weblog data from the DARPA GALE program.
AMR 3.0 training, development and test splits were translated into Spanish, Irish Gaelic, and Dutch using Google Translate. "Unsplit" directories were not translated and are not included in this release. Translations were not manually verified, but formal issues (such as unexpected new lines) were corrected, and special tokens and encoding issues were fixed with the Python tool ftfy.fix_text.
Data is presented in UTF-8 encoded txt files in PENMAN format.
Samples
Please view this text sample (TXT).
Updates
None at this time.