Chinese Proposition Bank 3.0


Item Name: Chinese Proposition Bank 3.0
Authors: Nianwen Xue, Xiaopeng Bai, Jill Lu, Jennifer Zhang, Martha Palmer, Meiyu Chang, Hua Zhong
LDC Catalog No.: LDC2013T13
ISBN: 1-58563-648-7
Release Date: Jul 15, 2013
Data Type: text
Data Source(s): broadcast conversation, broadcast news, journal articles, newswire, weblogs
Application(s): information extraction, linguistic analysis, machine translation, parsing
Language(s): Mandarin Chinese
Language ID(s): cmn
Distribution: Web Download
Member fee: $0 for 2013 members
Non-member Fee: US $300.00
Reduced-License Fee: N/A
Extra-Copy Fee: US $
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Nianwen Xue, et al.
2013
Chinese Proposition Bank 3.0
Linguistic Data Consortium, Philadelphia

Chinese Proposition Bank 3.0 is a continuation of the Chinese Proposition Bank project which aims to create a corpus of text annotated with information about basic semantic propositions. Chinese Proposition Bank 3.0 adds predicate-argument annotation on 187,731 words from Chinese Treebank 7.0 (LDC2010T07). The data sources are comprised of newswire, magazine articles, various broadcast news and broadcast conversation programming, web newsgroups and weblogs.

LDC has also released Chinese Proposition Bank 1.0 (LDC2005T23) and Chinese Proposition Bank 2.0 (LDC2008T07).

Data

This release contains the predicate-argument annotation of 173,206 verb instances and 14,525 noun instances. The annotation of nouns is limited to nominalizations that have a corresponding verb. The general annotation guidelines and the lexical guidelines (called frame files) for each verbal and nominal predicate are also included in this release. Below are some statistics about the corpus.

  • Total propositions for verbs - 173,206
  • Total propositions for nouns - 14,525
  • Total verbs framed - 24,642
  • Total framesets - 26,467
  • Verbs with multiple framesets - 1337
  • Average framesets per verb - 1.07
  • Total nouns framed - 1,421
  • Total noun framesets - 1,528
  • Nouns with multiple framesets - 48
  • Average framesets per nouns - 1.08

Samples

Please view the following samples.

Updates

None at this time.

Content Copyright

Portions 2006 Agence France Presse, 2006 Anhui TV, 2005 Cable News Network, LP, LLLP, 2000-2001 China Broadcasting System, 2000-2001, 2005-2006 China Central TV, 2000-2001 China National Radio, 2006 Chinanews.com, 2000-2001 China Television System, 2006 Guangming Daily, 2006 National Broadcasting Company, Inc., 2006 New Tang Dynasty TV, 2006 Peoples Daily Online, 2005-2006 Phoenix TV, 1999-2001 Sinorama Magazine, 1996-1998, 2006 Xinhua News Agency, 2001, 2004, 2005, 2007, 2008, 2009, 2010, 2013 Trustees of the University of Pennsylvania