Chinese Discourse Treebank 0.5
Item Name: | Chinese Discourse Treebank 0.5 |
Author(s): | Yuping Zhou, Jill Lu, Jennifer Zhang, Nianwen Xue |
LDC Catalog No.: | LDC2014T21 |
ISBN: | 1-58563-692-4 |
ISLRN: | 492-150-006-320-6 |
DOI: | https://doi.org/10.35111/njb6-wb02 |
Release Date: | October 15, 2014 |
Member Year(s): | 2014 |
DCMI Type(s): | Text |
Data Source(s): | newswire |
Application(s): | linguistic analysis, discourse analysis, discourse parsing, information extraction, information retrieval, language generation, subjectivity analysis, summarization |
Language(s): | Mandarin Chinese, Chinese |
Language ID(s): | cmn, zho |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2014T21 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Zhou, Yuping, et al. Chinese Discourse Treebank 0.5 LDC2014T21. Web Download. Philadelphia: Linguistic Data Consortium, 2014. |
Related Works: | View |
Introduction
Chinese Discourse Treebank 0.5 was developed at Brandeis University as part of the Chinese Treebank Project and consists of approximately 73,000 words of Chinese newswire text annotated for discourse relations. It follows the lexically grounded approach of the Penn Discourse Treebank (PDTB) (LDC2008T05) with adaptations based on the linguistic and statistical characteristics of Chinese text. Discourse relations are lexically anchored by discourse connectives (e.g., because, but, therefore), which are viewed as predicates that take abstract objects such as propositions, events and states as their arguments. Along with PDTB-style schemes for English, Turkish, Hindi and Czech, Chinese Discourse Treebank provides an additional perspective on how the PDTB approach can be extended for cross-lingual annotation of discourse relations.
Data
Data was selected from the newswire material in Chinese Treebank 8.0 (LDC2013T21), specifically, from Xinhua News Agency stories. There are approximately 5,500 annotation instances. Following the PDTB format, each annotation instance consists of 27 vertical bar delimited fields. The fields specify the attributes of the discourse relation as a whole, as well as the attributes of its two arguments. Not all fields are filled in this release. Filled fields are indicated by a pair of angle brackets; the remaining fields are place holders for future releases.
Samples
Please view this annotation sample and raw sample.
Updates
None at this time.