Chinese Proposition Bank 2.0

Item Name: Chinese Proposition Bank 2.0
Author(s): Nianwen Xue, Martha Palmer, Meiyu Chang, Zixin Jiang
LDC Catalog No.: LDC2008T07
ISBN: 1-58563-451-4
ISLRN: 794-819-316-121-4
Release Date: May 19, 2008
Member Year(s): 2008
DCMI Type(s): Text
Data Source(s): newswire
Application(s): parsing, machine translation, linguistic analysis, information extraction
Language(s): Mandarin Chinese
Language ID(s): cmn
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2008T07 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Xue, Nianwen, et al. Chinese Proposition Bank 2.0 LDC2008T07. Web Download. Philadelphia: Linguistic Data Consortium, 2008.

Chinese Proposition Bank 2.0 is a continuation of the Chinese Propostion Bank project, which aims to create a corpus of Chinese text annotated with information about basic semantic propositions. Chinese Propostion Bank 1.0 consists of predicate-argument annotation on 250,000 words from Chinese Treebank 5.0. Chinese Proposition Bank 2.0 adds predicate-argument annotation on 500,000 words from Chinese Treebank 6.0. The data sources include newswire from Xinhua News Agency, articles from Sinorama Magazine, news from the website of the Hong Kong Special Administrative Region and transcripts from various Chinese broadcast news programs.

Data

This release contains the predicate-argument annotation of 81,009 verb instances (11,171 unique verbs) and 14,525 noun instances (1,421 unique nouns). The annotation of nouns is limited to nominalizations that have a corresponding verb. The general annotation guidelines and the lexical guidelines (called frame files) for each verbal and nominal predicate are included in this release.

Total propositions for verbs: 81,009
Total propositions for nouns: 14,525
Total verbs framed: 11,171
Total framesets: 11,776
Verbs with multiple framesets: 474
Average framesets per verb: 1.05
Total nouns framed: 1,421
Total noun framesets: 1,528
Nouns with multiple framesets: 48
Average framesets per noun: 1.08

Samples

For an example of the data in this corpus, please examine this sample image(jpeg) of a parse tree.

Available Media

View Fees





Login for the applicable fee