This directory contains the data of the Chinese Propbank 2.0. This data is collected as an additional layer of annotation on the Chinese Treebank, representing the predicate argument structure of verbs and their nominalizations. Below is a list of each file and a description of its contents. File Description -------------------------------------------------------------------------------------- verbs.txt The annotated data for verbs, file format described below. This includes the annotations for files chtb_0001.fid to chtb_1151.fid, the first 500K words of the Chinese TreeBank nouns.txt The annotated data for nouns, file format described below. This includes the annotations for files chtb_0001.fid to chtb_1151.fid, the first 500K words of the Chinese TreeBank frames/ Lexical guidelines called 'frame files'. The file format for each predicate is detailed in frames/verb.dtd html_frames/ The html version of the frame files. NOTES.txt release notes for Chinese Proposition Bank 2.0 (cpb-2.0) -------------------------------------------------------------------------------------- Annotation Format. The verb.txt file contains predicate argument structure annotation of verbs in the Chinese Treebank and the noun.txt file contains the predicate argument structure annotation of nouns (limited to nominalized predicates that have a corresponding verb). Each P-A structure is represented in a line of space separated columns. The columns are as follows ctb-filename sentence terminal tagger frameset inflection proplabel proplabel ... The content of each column is described in detail below. ctb-filename the name of the file in the Chinese TreeBank, version 6.0 (ctb6.0) sentence the number of the sentence in the file (starting with 0) terminal the number of the terminal in the sentence that is the location of the verb. Note that the terminal number counts empty constituents as terminals and starts with 0. This will hold for all references to terminal number in this description. An example: (IP (NP-SBJ (DNP (NP (NN 货币)(NN 回笼))(DEG 的))(NP (NN 增加)))(PU ,) (VP (PP-BNF (P 为)(IP (NP-SBJ (-NONE- *PRO*))(VP (VV 平抑)(NP-OBJ (NP (DP (DT 全)) (NP (NN 区)))(NP (NN 物价))))))(VP (VV 发挥)(AS 了)(NP-OBJ (NN 作用)))) (PU 。)) The terminal numbers: 货币 0 回笼 1 的 2 增加 3 ,4 为 5 *PRO* 6 平抑 7 全 8 区 9 物价 10 发挥 11 了 12 作用 13 。14 tagger the name of the annotator, or "gold" if it's been double annotated and adjudicated. frameset The frameset identifier from the frames file of the verb. For example, '发挥.01' refers to the frameset ID "f1" in the frame file for the verb '发挥' (frames/0930-fa-hui.xml). The names of the frame files are composed of numerical id, plus the pinyin of the verb. The numerical ids can be found in the enclosed verb list (verbs.txt). inflection The inflection field is a carry-over from the Penn English Proposition Bank, and is set to '-----', meaning no annotation in the Chinese Proposition Bank. proplabel A string representing the annotation associated with a particular argument or adjunct of the proposition. Each proplabel is dash '-' delimited and has the following columns 1) column for the address of a constituent The address of the constituent are in one of the two forms. form 1: : A single node in the syntactic tree of the sentence in question, identified by the first terminal the node spans together with the height from that terminal to the syntax node (a height of 0 represents a terminal). For example, in the sentence (IP (NP-TPC (DP (DT 这些))(CP (WHNP-1 (-NONE- *OP*)) (CP (IP (NP-SBJ (-NONE- *T*-1)) (VP (ADVP (AD 已))(VP (VV 开业))))(DEC 的)))(NP (NN 外商)(NN 投资)(NN 企业))) (NP-ADV (NN 绝大部分))(NP-SBJ (NN 生产)(NN 经营)(NN 状况))(VP (ADVP (AD 较)) (VP (VA 好)))(PU 。)) the address of "1:3" represents the top IP node and 2:2 represents the CP node form 2: terminal number:height*terminal number:height*... A trace chain identifying coreference within sentence boundaries. For example in the sentence (IP (NP-TPC (DP (DT 这些))(CP (WHNP-1 (-NONE- *OP*)) (CP (IP (NP-SBJ (-NONE- *T*-1)) (VP (ADVP (AD 已))(VP (VV 开业))))(DEC 的)))(NP (NN 外商)(NN 投资)(NN 企业))) (NP-ADV (NN 绝大部分))(NP-SBJ (NN 生产)(NN 经营)(NN 状况))(VP (ADVP (AD 较)) (VP (VA 好)))(PU 。)) the address of of "2:0*1:0*6:1" represents the fact nodes '2:0' (-NONE- *T*-1), '1:0' (-NONE- *OP*) and '6:1' (NP (NN 外商)(NN 投资)(NN 企业)) are coreferential. 2) column for the 'label' The argument label one of {rel, ARGM} + { ARG0, ARG1, ARG2, ... }. The argument labels correspond to the argument labels in the frames files (see ./frames). ARGM for adjuncts of various sorts, and 'rel' refers to the surface string of the predicate. 3) column for 'functional tag' (optional for numbered arguments; required for ARGM) Functional tags for "split" numbered arguments: PSR - possessor PSE - possesse CRD - coordinator PRD - predicate QTY - quantity Propositional tags for numbered arguments: AT, AS, INTO, TOWARDS, TO, ONTO Functional tags for ARGM: ADV - adverbial, default tag BNF - beneficiary CND - conditional DIR - directional DIS - discourse connective DGR - degree EXT - extent FRQ - frequency LOC - location MNR - manner NEG - negation PRP - purpose and reason TMP - temporal TPC - topic -------------------------------------------------------------------------------------- Some basic statistics of this release Total propositions for verbs: - 81,009 Total propositions for nouns: - 14,525 Total verbs framed - 11,171 Total framesets - 11,776 Verbs with multiple framesets - 474 Average framesets per verb - 1.05 Total nouns framed: - 1,421 Total noun framesets: - 1,528 Nouns with multiple framesets - 48 Average framesets per nouns - 1.08 ------------------------------------------------------------------------------------ References: * Nianwen Xue. In press. Labeling Chinese predicates with semantic roles. Computational Linguistics. * Nianwen Xue. 2006. A Chinese lexicon of roles and senses. Journal of Language Resources and Evaluation, 40:395-403. * Nianwen Xue and Martha Palmer. Adding semantic roles to the Chinese Treebank. Revision under review for Natural Language Engineering. * Nianwen Xue. 2006. Semantic Role Labeling of nominalized predicates in Chinese, in Proceedings of HTL-NAACL 2006. New York City. * Nianwen Xue and Martha Palmer. 2005. Automatic Semantic Role Labeling for Chinese Verbs, in Proceedings of the 19th International Joint Conference on Artificial Intelligence. Edinburgh, Scotland. * Nianwen Xue. 2004. Handling Dislocated and Discontinuous Constituents in Chinese Semantic Role Labeling. In Proceedings of the 4th Workshop on Asian Language Resources, in conjunction with IJNLP 2004, Hainan Island, China. * Nianwen Xue and Martha Palmer. 2003. Annotating Propositions in the Penn Chinese Treebank. In Proceedings of the Second Sighan Workshop, Sapporo, Japan. * Nianwen Xue and Seth Kulick. 2003. Automatic Predicate Argument Structure Analysis of the Penn Chinese Treebank. In Proceedings of Machine Translation Summit IX, New Orleans, Louisiana, USA.