This directory contains the data of the Korean Propbank Annotations.  This data is collected
as an additional layer of annotation on the Korean Treebank, representing the predicate
argument structure of predicates.  Below is a list of each file and a description of
its contents.


File                      Description
--------------------------------------------------------------------------------------
virginia-verbs.pb         The annotated data, file format described below.
                          This includes the annotations for the Korean English
			  Treebank Annotations (aka. virginia).

newswire-verbs.pb	  The annotated data, file format described below.
                          This includes the annotations for the Korean Treebank
                          version 2.0 (aka. newswire).

verb.dtd		  The file format of each frames file
                          
framefiles/               Lexical Guidelines.  The file format for each predicate is
                          detailed in ./verb.dtd

treebank/		  Treebank files. They are revised from the original
			  published treebank files for this Propbank annotations
                          
--------------------------------------------------------------------------------------

                               Annotation Format.

Both verbs.pb files contain predicate argument structures of predicates.  Each P-A structure
is represented in a line of space separated columns.  The columns are as follows

  treebank-filename sentence terminal tagger frameset proplabel proplabel ...

The content of each column is described in detail below.

treebank-filename
	the name of the file in the Korean English Treebank Annotations (aka. virginia)
	and The Korean Treebank version 2.0 (aka. newswire)
    
sentence
	the number of the sentence in the file (starting with 0)
    
terminal
	the number of the terminal in the sentence that is the location of the
	predicate.  note that the terminal number counts lexical words and
	grammatical words and empty constituents, except the affixes, as
	terminals and starts with 0.  This will hold for all references to
	terminal number in this description.

    An example:  
        (S (NP-SBJ Mi-ya/NPR+neun/PAU)			Mi-ya + topic marker
	   (VP (S-OBJ (NP-SBJ *pro*)
	              (VP (NP-COMP cip/NNC+e/PAD)	home + adverbial marker
                          ka/VV+ki/ENM+reul/PCA))	go + nominative + accusative
               weon-ha/VV+eun-ta/EFN))			want + sentence ending
        
    The terminal numbers:
        mi-ya 0; neun 1; *pro* 2; cip 3; e 4; ka 5; ki 6; reul 7; weon-ha 8;
        eun-ta 9
        
tagger
    the name of the annotator.
    
predicate
    The predicates have the root form. The reason we use the root as the
    framesfiles lemma is that the same roots have the same predicate argument
    structures in deep sentences in Korean. For example, 'meok-ta', 'meok-hi-ta'
    'meok-i-ta' all have the root meaning of EAT, which appear as 'meok.'
    'kong-kyeok-ha-ta', 'kong-kyeok-toe-ta', 'kong-kyeok-pat-ta' all have the
    root meaning of ATTACK, which appear as 'kong-kyeok'. Refer to
    ./Korean-Resources.ppt for details.
    The predicates are represented in their romanized form.

frameset

    The frameset identifier from the frames file of the predicate.  For
    example, 'ka.01' refers to the frames file for 'ka', (framefile/ka.kor.xml)
    and the roleset element in that frames file whose attribute 'id' is
    'ka.01'.


proplabel

    A string representing the annotation associated with a particular argument
    or adjunct of the proposition.  Each proplabel is dash '-' delimited and
    has the following columns

  1) column for the 'syntactic relation'
  
    The syntactic relation of the argument label.  This can be in one of 4 forms.
    
    form 1: <terminal number>:<height>
      A single node in the syntax tree of the sentence in question, identified
      by the first terminal the node spans together with the height from that
      terminal to the syntax node (a height of 0 represents a terminal).

      For example,  in the sentence
      
        (S (NP-SBJ (S (WHNP-1 *op*)			 
                      (S (NP-SBJ *T*-1)			 
                         (VP ca/VV+neun/EAN)))		 sleep + adnominal ending
                   (NP a-i/NNC+ka/PCA))			 child + subjective
           (ADJP manh/VJ+ta/EFN))			 many + sentence ending

        A syntactic relation of "1:1" represents the NP-SBJ immediately dominating
        the terminal "(*T*-1)" and a syntactic relation of "0:2" represents 
        the "S" node immediately dominating the terminal (WHNP-1 *op*).
        
    form 2: terminal number:height*terminal number:height*...
      
      A trace chain identifying coreference within sentence boundaries.

      For example in the sentence

        (S (NP-SBJ (S (WHNP-1 *op*)			 
                      (S (NP-SBJ *T*-1)			 
                         (VP ca/VV+neun/EAN)))		 sleep + adnominal ending
                   (NP a-i/NNC+ka/PCA))			 child + subjective
           (ADJP manh/VJ+ta/EFN))			 many + sentence ending

        A syntactic relation of "1:1*4:1" represents the NP immediately dominating
        (NP-SBJ *T*-1-NONE-) and the NP immediately dominating "(NP 아이/NNC+가/PCA)".
      
      
    form 3: terminal number:height,terminal number:height,...
    
      A split argument, where there is no single node that captures the argument
      and the components are not coreferential. This form is used
      to denote phrasal variants of verbs.  For example, in the sentence

      (S (NP-SBJ na/NPN+neun/EAN)			 I + topic marker
         (VP (NP-OBJ Mi-ya/NPR+reul/PCA)		 Mi-ya + accusative
	     (NP-COMP pa-po/NNC+ro/PAD)			 idiot + adverbial marker
             (VV sanng-kak/NNC+ha/XSV+eun-ta/EFN)))	 think + suffix + sentence ending

      The phrasal argument of verb 'saeng-kak-ha' would be identified with the
      syntactic relation  "2:1,4:1".

    form 4: terminal number:height,terminal number:height*terminal number:height...

      This form is a combination of forms 2 and 3.  When this occurs, the ',' operator
      is understood to have precedence over the '*' operator.  For example, in
      the phrasal segment

       (NP-SBJ (S (WHNP-1 *op*)
	          (S (NP-SBJ nae/NPN+ka/PCA)			I + subjective
                     (VP (NP-OBJ *T*-1)
		         (NP-COMP pa-po/NNC+ro/PAD)		idiot + adverbial
		         (VV saeng-kak/NNC+ha/XSV+eun/EAN))))   think + suffix + adnominal
               (NP Mi-ya/NPR))					Mi-ya

       The proplabel 3:1*8:1,4:1-ARG0 is to be understood as a split argument (form 3), one
       of whose constituents is a trace-chain (form 2) - i.e. grouped like so: 
       ((3:1*8:1),4:1).


  2) column for the 'label'
  
    The argument label one of {rel, ARGA, ARGM} + { ARG0, ARG1, ARG2,
    ... }.  The argument labels correspond to the argument labels in the frames
    files (see ./framefile).  ARGA is used for causative agents, ARGM for
    adjuncts of various sorts, and 'rel' refers to the surface string of
    the predicate.

  3) column for feature (optional for numbered arguments; required for ARGM)

    Argument features can either be a labelled feature, or a preposition.  Labelled 
    features follow:

    EXT - extent
    DIR - direction
    LOC - location
    TMP - temporal
    PRD - predication
    NEG - negation
    ADV - adverbial
    MNR - manner
    CAU - cause
    PRP - purpose not cause.
    DIS - discourse
    INS - instrument
    CND - condition

--------------------------------------------------------------------------------------