This directory contains the data of the Korean Propbank Annotations. This data is collected as an additional layer of annotation on the Korean Treebank, representing the predicate argument structure of predicates. Below is a list of each file and a description of its contents. File Description -------------------------------------------------------------------------------------- virginia-verbs.pb The annotated data, file format described below. This includes the annotations for the Korean English Treebank Annotations (aka. virginia). newswire-verbs.pb The annotated data, file format described below. This includes the annotations for the Korean Treebank version 2.0 (aka. newswire). verb.dtd The file format of each frames file framefiles/ Lexical Guidelines. The file format for each predicate is detailed in ./verb.dtd treebank/ Treebank files. They are revised from the original published treebank files for this Propbank annotations -------------------------------------------------------------------------------------- Annotation Format. Both verbs.pb files contain predicate argument structures of predicates. Each P-A structure is represented in a line of space separated columns. The columns are as follows treebank-filename sentence terminal tagger frameset proplabel proplabel ... The content of each column is described in detail below. treebank-filename the name of the file in the Korean English Treebank Annotations (aka. virginia) and The Korean Treebank version 2.0 (aka. newswire) sentence the number of the sentence in the file (starting with 0) terminal the number of the terminal in the sentence that is the location of the predicate. note that the terminal number counts lexical words and grammatical words and empty constituents, except the affixes, as terminals and starts with 0. This will hold for all references to terminal number in this description. An example: (S (NP-SBJ Mi-ya/NPR+neun/PAU) Mi-ya + topic marker (VP (S-OBJ (NP-SBJ *pro*) (VP (NP-COMP cip/NNC+e/PAD) home + adverbial marker ka/VV+ki/ENM+reul/PCA)) go + nominative + accusative weon-ha/VV+eun-ta/EFN)) want + sentence ending The terminal numbers: mi-ya 0; neun 1; *pro* 2; cip 3; e 4; ka 5; ki 6; reul 7; weon-ha 8; eun-ta 9 tagger the name of the annotator. predicate The predicates have the root form. The reason we use the root as the framesfiles lemma is that the same roots have the same predicate argument structures in deep sentences in Korean. For example, 'meok-ta', 'meok-hi-ta' 'meok-i-ta' all have the root meaning of EAT, which appear as 'meok.' 'kong-kyeok-ha-ta', 'kong-kyeok-toe-ta', 'kong-kyeok-pat-ta' all have the root meaning of ATTACK, which appear as 'kong-kyeok'. Refer to ./Korean-Resources.ppt for details. The predicates are represented in their romanized form. frameset The frameset identifier from the frames file of the predicate. For example, 'ka.01' refers to the frames file for 'ka', (framefile/ka.kor.xml) and the roleset element in that frames file whose attribute 'id' is 'ka.01'. proplabel A string representing the annotation associated with a particular argument or adjunct of the proposition. Each proplabel is dash '-' delimited and has the following columns 1) column for the 'syntactic relation' The syntactic relation of the argument label. This can be in one of 4 forms. form 1: : A single node in the syntax tree of the sentence in question, identified by the first terminal the node spans together with the height from that terminal to the syntax node (a height of 0 represents a terminal). For example, in the sentence (S (NP-SBJ (S (WHNP-1 *op*) (S (NP-SBJ *T*-1) (VP ca/VV+neun/EAN))) sleep + adnominal ending (NP a-i/NNC+ka/PCA)) child + subjective (ADJP manh/VJ+ta/EFN)) many + sentence ending A syntactic relation of "1:1" represents the NP-SBJ immediately dominating the terminal "(*T*-1)" and a syntactic relation of "0:2" represents the "S" node immediately dominating the terminal (WHNP-1 *op*). form 2: terminal number:height*terminal number:height*... A trace chain identifying coreference within sentence boundaries. For example in the sentence (S (NP-SBJ (S (WHNP-1 *op*) (S (NP-SBJ *T*-1) (VP ca/VV+neun/EAN))) sleep + adnominal ending (NP a-i/NNC+ka/PCA)) child + subjective (ADJP manh/VJ+ta/EFN)) many + sentence ending A syntactic relation of "1:1*4:1" represents the NP immediately dominating (NP-SBJ *T*-1-NONE-) and the NP immediately dominating "(NP ¾ÆÀÌ/NNC+°¡/PCA)". form 3: terminal number:height,terminal number:height,... A split argument, where there is no single node that captures the argument and the components are not coreferential. This form is used to denote phrasal variants of verbs. For example, in the sentence (S (NP-SBJ na/NPN+neun/EAN) I + topic marker (VP (NP-OBJ Mi-ya/NPR+reul/PCA) Mi-ya + accusative (NP-COMP pa-po/NNC+ro/PAD) idiot + adverbial marker (VV sanng-kak/NNC+ha/XSV+eun-ta/EFN))) think + suffix + sentence ending The phrasal argument of verb 'saeng-kak-ha' would be identified with the syntactic relation "2:1,4:1". form 4: terminal number:height,terminal number:height*terminal number:height... This form is a combination of forms 2 and 3. When this occurs, the ',' operator is understood to have precedence over the '*' operator. For example, in the phrasal segment (NP-SBJ (S (WHNP-1 *op*) (S (NP-SBJ nae/NPN+ka/PCA) I + subjective (VP (NP-OBJ *T*-1) (NP-COMP pa-po/NNC+ro/PAD) idiot + adverbial (VV saeng-kak/NNC+ha/XSV+eun/EAN)))) think + suffix + adnominal (NP Mi-ya/NPR)) Mi-ya The proplabel 3:1*8:1,4:1-ARG0 is to be understood as a split argument (form 3), one of whose constituents is a trace-chain (form 2) - i.e. grouped like so: ((3:1*8:1),4:1). 2) column for the 'label' The argument label one of {rel, ARGA, ARGM} + { ARG0, ARG1, ARG2, ... }. The argument labels correspond to the argument labels in the frames files (see ./framefile). ARGA is used for causative agents, ARGM for adjuncts of various sorts, and 'rel' refers to the surface string of the predicate. 3) column for feature (optional for numbered arguments; required for ARGM) Argument features can either be a labelled feature, or a preposition. Labelled features follow: EXT - extent DIR - direction LOC - location TMP - temporal PRD - predication NEG - negation ADV - adverbial MNR - manner CAU - cause PRP - purpose not cause. DIS - discourse INS - instrument CND - condition --------------------------------------------------------------------------------------