Home › Language Resources › Data

Chinese Abstract Meaning Representation 1.0

Item Name:	Chinese Abstract Meaning Representation 1.0
Author(s):	Bin Li, Yuan Wen, Li Song, Rubing Dai, Weiguang Qu, Nianwen Xue
LDC Catalog No.:	LDC2019T07
ISBN:	1-58563-880-3
ISLRN:	376-537-072-369-4
DOI:	https://doi.org/10.35111/8ddt-ze77
Release Date:	April 15, 2019
Member Year(s):	2019
DCMI Type(s):	Text
Data Source(s):	weblogs, discussion forum
Project(s):	ACE
Application(s):	parsing, syntactic parsing, semantic role labelling
Language(s):	Mandarin Chinese
Language ID(s):	cmn
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2019T07 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Li, Bin, et al. Chinese Abstract Meaning Representation 1.0 LDC2019T07. Web Download. Philadelphia: Linguistic Data Consortium, 2019.
Related Works: Hide	View hasVersion LDC2021T13 Chinese Abstract Meaning Representation 2.0 isAnnotationOf LDC2013T21 Chinese Treebank 8.0 isSimilarWith LDC2014T12 Abstract Meaning Representation (AMR) Annotation Release 1.0 LDC2017T10 Abstract Meaning Representation (AMR) Annotation Release 2.0 LDC2020T02 Abstract Meaning Representation (AMR) Annotation Release 3.0

Introduction

Chinese Abstract Meaning Representation was developed by Brandeis University and Nanjing Normal University and is comprised of semantic representations of a set of Chinese sentences from Chinese Treebank 8.0 (LDC2013T21).

Abstract Meaning Representation (AMR) captures "who is doing what to whom" in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree structure. LDC has released the following AMR English data sets: Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12) and Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).

Chinese AMR is based on the annotation methodology developed for English with adaptations for handling specific Chinese phenomena. The goal of the Chinese AMR project is to create a large aligned AMR corpus, of which this data set is the first release. For more information about the project, see the Chinese AMR homepage.

Data

The text is extracted from the 10,325 sentences of the weblog and discussion forum portions of Chinese Treebank 8.0. Annotations were applied to 10,149 sentences, with 176 sentences unannotated.

The data is divided into training, development and test sets. These three files are presented as plain text in UTF-8 encoding.

Chinese Abstract Meaning Representation 1.0

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees