Chinese Abstract Meaning Representation 2.0
Item Name: | Chinese Abstract Meaning Representation 2.0 |
Author(s): | Bin Li, Liming Xiao, Yihuan Liu, Yuan Wen, Li Song, Jayeol Chun, Minxuan Feng, Junsheng Zhou, Weiguang Qu, Nianwen Xue |
LDC Catalog No.: | LDC2021T13 |
ISBN: | 1-58563-970-2 |
ISLRN: | 483-739-101-185-5 |
DOI: | https://doi.org/10.35111/x61v-0p46 |
Release Date: | July 15, 2021 |
Member Year(s): | 2021 |
DCMI Type(s): | Text |
Data Source(s): | discussion forum, newswire, weblogs |
Application(s): | parsing, semantic role labelling, syntactic parsing |
Language(s): | Mandarin Chinese |
Language ID(s): | cmn |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2021T13 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Li, Bin, et al. Chinese Abstract Meaning Representation 2.0 LDC2021T13. Web Download. Philadelphia: Linguistic Data Consortium, 2021. |
Related Works: | View |
Introduction
Chinese Abstract Meaning Representation (CAMR) 2.0 was developed by Brandeis University and Nanjing Normal University and is comprised of semantic representations of a set of approximately 20,000 Chinese sentences from Chinese Treebank (CTB) 8.0 (LDC2013T21). CAMR 2.0 includes the content of Chinese Abstract Meaning Representation 1.0 (LDC2019T07) (CTB 8.0 weblog and discussion forum sentences), plus an additional 9,933 sentences from the newswire portion of CTB 8.0.
Abstract Meaning Representation (AMR) captures "who is doing what to whom" in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree structure. LDC has released the following AMR English data sets: Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12), Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10) and Abstract Meaning Representation (AMR) Annotation Release 3.0 (LDC2020T02).
Chinese AMR is constructed following the basic principles developed for English: a compact, readable, whole-sentence semantic representation, while making adaptations where necessary to handle Chinese-specific phenomena. For more information about the project, see the Chinese AMR homepage.
Data
The text contains 20,078 sentences from the weblog, discussion forum, and newswire portions of CTB 8.0. Three sets of files are included: the original Chinese AMR data with concept-to-word and relation-to-word alignments, a converted English AMR format, and a Chinese syntactic dependency tree format. Each set is divided into training, development and test sets, and all files are presented as plain text in UTF-8 encoding.
Samples
Please view this sample (TXT).
Updates
None at this time.