ADDENDUM TO THE SWITCHBOARD TREEBANK GUIDELINES Ann Bies, Justin Mott, Colin Warner August 2005 This file contains a list of Part-of-Speech-tagging and parsing decisions that were made in the course of annotation. As such, it is primarily a reference for annotators rather than a statement of policy. The first part of this document summarizes some of the farther-reaching decisions: this is followed by a list of smaller decisions as recorded in annotator meeting notes. *** Use of S-UNF. S-UNF is used for a wide variety of utterances that consist only of filler but that have been MDE annotated as incomplete (as indicated by final punctuation of "--"). Thus: (S-UNF (INTJ um) --) (S-UNF (PRN (S (NP-SBJ you) (VP know))) --) ((S-UNF (INTJ um) (INTJ um) --) ((S-UNF (INTJ like) --) ***Nested clausal restarts Per switchboard policy, restarts of clauses (any label that starts with S) are embedded: (S (NP-SBJ I) (VP think (SBAR (EDITED (SBAR that (S-UNF (NP-SBJ they))) +) that ...)) This policy is recursively applied to multiple clausal restarts: (S (NP-SBJ I) (VP think (SBAR (EDITED (SBAR-UNF (EDITED (SBAR-UNF that) +) that) +) that ...)) ***Restart of clause versus restart of phrase A restart of complementizer or wh- phrase in SBAR, or auxiliary in SQ are restarts of the full clause. The S* node is included under the EDITED node, which is embedded: fsh_113405.A.parse.ftags.ag.xml:90 (SQ (EDITED (SQ-UNF di- )+) does n't (NP-SBJ this study) (VP stop (PP-TMP in (INTJ like) (NP (NP december) , or (NP something)))) ?) (S (NP-SBJ I) (VP know (SBAR (EDTIED (SBAR (WHNP-UNF wha-) +) (WHNP-1 what) (S (NP-SBJ you) (VP did (NP-1 *T*) (NP-TMP last summer)))))) But note that restart of NP-SBJ is treated as a restart of NP rather than the entire clause: (S (EDITED (NP-SBJ I) +) (NP-SBJ I) (VP do n't (VP care))) A restart of a subject plus other S-level modifier is a clausal restart: (S (EDITED (S-UNF (NP-SBJ I) (ADVP really)) +) (NP-SBJ I) (ADVP really) (VP do n't (VP care))) *** Level of restarts: When it is ambiguous what constituent is being restarted (i.e., multiple constituents share a left boundary), assume that it is the larger constituent that is being restarted: (EDITED (ADVP-TMP (NP-UNF ten)) +) (ADVP-TMP (NP ten years) ago) rather than (ADVP-TMP (EDITED (NP-UNF ten) +) (NP ten years) ago) ***GW tag no longer used The GW POS tag has not been used in this corpus. Instead, the components are assigned the same POS tag as the unbroken, actual word: (ADVP (TYPO in/RB deed/RB)) (NP (TYPO in/NN ternet/NN)) ***Incomplete Tokens A token ending with "-" is always considered to be incomplete (and should take -UNF on the relevant node), even if the speakers intended word is present in its entirety: (S (EDITED (NP-SBJ-UNF I-) +) (NP-SBJ I) (VP did (NP it))) ***Recovery of POS/TreeBank information for incomplete tokens. If a restart has the same initial letter(s) as the following token, assume it is restarted and give it the same node/POS information. Example: "w- well you know you just" (S-UNF (EDITED (INTJ (w-)) (INTJ well) (PRN (S (NP-SBJ you) (VP know))) (NP-SBJ you) (ADVP just)) ***Use of X and XX tags on unrecoverable incomplete tokens The X node label and POS tag XX (almost always used in conjunction) are used when an incomplete token is judged to be unrecoverable. In an effort to avoid reading information that is not there into unfinished tokens, we have used these tags liberally. "w- di- he didn't" (S (EDITED (X w-)) (EDITED (SQ di- )) (NP-SBJ He) (VP did 'nt)) This is also applied when it is relatively clear that the speaker has made a performance error that is reflected in the transcription. So in the following example, "ag-" is probably a restart of "ig-"; although it is likely that both are corrected to "exactly", it is not clear enough to warrant annotating them that way. fsh_111585.B.ag.xml:13 (S (EDITED (S (NP-SBJ it) (VP-UNF 's))) (EDITED (EDITED (X ig-) +) (X ag-) +) (INTJ exactly) .) Note that the -UNF tag is never put on X, but rather on the parent node (if there is one): fsh_111106_JM.B.parse.src.xml:10 "i've seen s-" (S (NP-SBJ I) (VP 've (VP-UNF seen (X s-)))) 's-' has POS tag XX *** Do not ever put EDITED around an entire utterance. Use -UNF on the appropriate node instead (S (INTJ so) (NP-SBJ she) (VP 's (NP-PRD-UNF a g-))) *** Annotation of sentence-initial "so" Historically, making distinctions between different uses of "so" has been extremely difficult. For simplicity's sake, we have chose to annotate sentence-initial "so" as INTJ if it was marked as filler in the MDE annotation and as ADVP if it was not. *** Annotation of various nominal premodifiers: Nominal premodifiers of adjectives and adverbs are marked as NP: ADJP: (ADJP (NP a little) strange) (ADJP (NP New York) based) ADVP: (ADVP (a long time) ago) Nominal premodifiers of prepositions and complementizers are marked as NP-ADV: (SBAR (NP-ADV three years) before (S I went there)) (PP (NP-ADV three months) before (NP that)) *** expanded use of -TPC Non-indexed -TPC is now also used for topic marking: (S (NP-TPC My father) (NP-SBJ He) (VP is (NP-PRD a great man))) (S (NP-SBJ it) (VP 's (ADJP-PRD beautiful) (NP-TPC the country))) *** INTJ and UH All daughters of an INTJ node should have the UH POS tag (INTJ (INTJ (UH Oh)) (INTJ (UH my) (UH god))) *****SMALLER DECISIONS PULLED FROM MEETING NOTES:***** *** A sentence starting with "because" has SBAR-PRP as the top node. (SBAR-PRP because (S-UNF (NP-SBJ you) (ADVP just) )) *** "maybe", "probably", etc, when they can't be put at VP level, can go inside of NP tagged ADVP. (NP (ADVP maybe) (ADVP just) (NP more acquaintances) (SBAR that they know from church)) (PP for (NP (ADVP probably) (QP six or nine) months)) (ADVP-TMP (NP (ADVP maybe) two years) ago) *** Traces inside of EDITED nodes: The manual has: (SBARQ (EDITED (SBARQ (WHNP what) (SQ-UNF 's))) (WHNP-1 what) (SQ 's (NP-SBJ toning) (NP-PRD-1 *T*))) We're fine with this treatment (no trace in EDITED node), but there are other more complicated positions. *** -TTL, like -NOM, goes on individual coordinated elements rather than the parent node: fsh_111585.A.ag.xml:35 (PP like (NP (NP-TTL survivor) and (NP-TTL bachelor))) *** Adverbial premodification of a PP is represented as follows. Also, note POS tags on "back stabbing": (VP are (PP-PRD (ADVP all) about (ADVP sort of) (S-NOM (NP-SBJ *) (VP back/NN stabbing/VBG (NP each other))))) *** In general, try to follow the disfluency annotation as much as possible, even if this sacrifices a more intelligible reading: fsh_111585.A.ag.xml:44 is such [EDIT_ST] e- [EDIT_END] + big audience [EDIT_ST] for that [EDIT_END] + sort of [FL_ST] you know [FL_END] for t.v. shows that are (S (NP-SBJ i) (VP think , (SBAR 0 (S (NP-SBJ (NP it) (SBAR-1 *EXP*)) (VP 's (ADJP-PRD interesting) , (SBAR-1 that (S (NP-SBJ there) (VP is (NP-PRD (NP (NP such (EDITED (X e-) +) big audience) (EDITED (PP for (NP-UNF that)) +) (ADVP sort of) (PRN (S (NP-SBJ you) (VP know))) (PP for (NP (NP (NML t. v.) shows) (SBAR (WHNP-2 that) (S (NP-SBJ-2 *T*) (VP are (ADVP basically) (VP teaching (S (NP-SBJ you) (VP to (VP do (NP the absolute wrong thing))))))))))))))))))) .) *** fsh_111585.B.ag.xml:4 "more like" (PP-CLR of (NP-1 *ICH* )) (ADVP more) (INTJ like) (NP-1 (NP-TTL joe millionaire) , and (INTJ um) (PRN (S (NP-SBJ you) (VP know))) (NP (NP that kind) (PP of (NP stuff)))))) *** Ex. of near-word salad: note use of FRAG fsh_111585.B.ag.xml:18 (FRAG but (PRN (S (NP-SBJ you) (VP know))) (INTJ like) (SBAR-TMP (WHADVP-1 when) (S (S (NP-SBJ you) (VP 're (VP flipping (NP channels) (ADVP-TMP-1 *T*)))) , (EDITED and +) or (S (NP-SBJ-2 you) (VP 're (VP stuck (S (NP-SBJ-2 *) (VP waiting (PP-CLR for (NP a commercial)) (ADVP-TMP-1 *T*)))))))) , (ADJP next) (INTJ um) (NP (NP men) (SBAR (WHNP-3 who) (S (NP-SBJ-3 *T*) (VP wear (NP (NP their women 's) clothing) , (EDITED (X w-) +) (INTJ um) (SBAR-TMP while (EDITED (NP-UNF whate-) +) (PRN (NP-SBJ you) (VP know)) (NP whatever)))))) , (PRN (S (NP-SBJ you) (VP know))) (NP (NP (NP something) (ADJP ridiculous)) (RRC (ADJP next) (PP on (NP-TTL oprah)))) .) (ADJP next) (INTJ um) (NP (NP men) (SBAR (WHNP-3 who) (S (NP-SBJ-3 *T*) (VP wear (NP (NP their women 's) clothing) , (EDITED (X w-) +) (INTJ um) (SBAR-TMP while (EDITED (NP-UNF whate-) +) (PRN (NP-SBJ you) (VP know)) (NP whatever)))))) , (PRN (S (NP-SBJ you) (VP know))) (NP (NP (NP something) (ADJP ridiculous)) (RRC (ADJP next) (PP on (NP-TTL oprah)))) .) fsh_111585.B.ag.xml:33 note creative NAC and lack of resumption (S (SBAR-NOM-TPC (WHNP-1 (WP whoever)) (S (NP-SBJ-1 (-NONE- *T*)) (VP (BES 's) (PP-PRD (IN on) (NP (DT the) (NN show)))))) (EDITED (VP-UNF (VBZ is)) (DISFL-IP +)) (NP-SBJ (PRP it)) (VP (BES 's) (NP-PRD (NP (NP (PRP$ their)) (NAC-3 (-NONE- *ICH*))) (NN reality)) (NAC-3 (RB not) (NP (NP (DT the) (NN person)) (VP (VBG viewing) (NP (PRP it)))))) (. .)) ***Function tags inside EDITED If you are reasonably certain that the restart should get a function tag put it in. As with POS and node label determinations, this decision is often easier if the material inside EDITED is repeated later in the sentence. #68 how about around around uh oregon (EDITED (PP-LOC-UNF around)) (PP-LOC around (INTJ uh) (NP Oregon)) ***use of EDITED There was some debate here about whether to treat this as S-UNF coordinated with S. We decided against that. We don't have an incomplete phrase coordinated with a complete one: incomplete phrases are always inside an EDITED unless they are the entire utterance (as in "But you--", "I did a--", "I need s---", etc). fsh_110863.B.parse.src.xml:58 (S so (EDITED (S (NP-SBJ it) (VP was (NP-PRD-UNF a lo-)))) (PRN (S (NP-SBJ you) (VP know))) (EDITED (S (NP-SBJ it) (VP was n't (NP-PRD (NP something) (SBAR (WHNP-1 0) (S (NP-SBJ-1 *T*) (VP-UNF to (EDITED (ADVP just)) (PRN (S (NP-SBJ you) (VP know))) (ADVP just)))))))) (INTJ well) (EDITED (S (NP-SBJ seven hundred) (VP is n't (NP-PRD (NP something) (SBAR (WHNP-2 0) (S (NP-SBJ-2 *T*) (VP-UNF to))))))) (PRN (S (NP-SBJ you) (VP know))) but (NP-SBJ you) (VP get (PP over (NP it))) .) *** Percolation of UNF-ness. The second incomplete token does not get an EDITED because of its final position in the utterance. We assume the whole utterance is incomplete instead. Note that the top S does not need an -UNF because we assume that the UNF-ness percolates upward (from INTJ-UNF in this case) fsh_111106.A.parse.src.xml:28 "that- ye-" (S (EDITED (NP-UNF that-)) (INTJ-UNF ye-)) *** use of X outside of EDITED fsh_111106_JM.B.parse.src.xml:10 "i've seen s-" (S (NP-SBJ I) (VP 've (VP-UNF seen (X s-)))) 's-' has POS tag XX but, here, we can make a reasonable determination: fsh_111106_JM.A.parse.src.xml:75 (NP (NP topics ) (VP listed (NP *) (PP-CLR on (NP-UNF tho-)))) 'tho-' gets POS DT. *** "What an X" (Paragraph (NP what/WDT a great trip)) *** More of S-UNF (S-UNF (PRN (S (NP-SBJ you) (VP know))) (INTJ uh)) *** 'what' as an INTJ fsh_111106.A.parse.src.xml:52 (ADVP-TMP (NP about (INTJ what ) a month ) ago ) *** FRAG on utterances missing the verb fsh_111106_JM.A.parse.src.xml:74 (FRAG (NP-SBJ i ) not (ADJP-PRD sure) .) *** A "triple is" construction fsh_109841.B.ag.xml:19 (S (PRN (S (NP-SBJ you) (VP know))) (CONJP and so) (S-SBJ (NP-SBJ that) (VP was (NP-PRD the only thing))) (VP was , (SBAR-PRD 0 (S (NP-SBJ the approach) (VP was (INTJ like) (ADJP-PRD really extremely high tech))))) .) *** Potential overt subjects for imperatives are treated as vocatives: fsh_109841.B.ag.xml:33 (S-IMP-SEZ (INTJ well) (NP-VOC you) (VP move (SBAR-PRP so (S (NP-SBJ i) (VP can (VP drive)))))) *** S-UNF for a string of EDITEDs and INTJs; also, X around an unclear unfinished item: fsh_109841.B.ag.xml:57 (S-UNF (INTJ um) (INTJ well) (EDITED (NP-SBJ i) (VP think (SBAR 0 (S (SBAR-NOM-SBJ (WHNP-2 what) (S (NP-SBJ-1 they) (ADVP really) (VP need (S (NP-SBJ-1 *) (VP to (VP focus (NP their money) (PP-CLR on (NP-2 *T*)))))))) (VP-UNF is)))) +) (EDITED (WHADVP where) (S (NP-SBJ they) (VP fell (PP through (NP-UNF the)) (X y-) )) +) (INTJ um) --) *** Renaming after an aside: fsh_109841.B.ag.xml:60 (S (PRN (S (NP-SBJ i) (VP mean))) (NP-SBJ (NP my girlfriend 's) husband) (VP works (PP-CLR for (INTJ um) (PRN (SBARQ (WHNP-1 what) (SQ do (NP-SBJ they) (VP call (S (NP it) (NP-PRD-1 *T*)) , (SBAR-TMP (WHADVP when) (S-UNF (S (NP-SBJ you) (VP come (PP in (NP the country)))) , and (S (NP-SBJ-2 you) (VP have (S (NP-SBJ-2 *) (VP to (VP get (VP approved (NP-2 *))))))) , and))))) (NP that department)))) ?) *** an example of FRAG around a string of material that is difficult to interpret: fsh_109841.B.ag.xml:65 (S (PRN (NP-SBJ you) (VP know)) (CONJP and then) (NP-SBJ you) (VP got (NP (NP this mexican guy) (VP trying (S (NP-SBJ *) (VP to (VP get (NP a drivers licenses) (FRAG (PP in (NP this (EDITED e- +) election)) (ADVP now) (PP for (INTJ um) (NP illegal aliens))))))))) .) *** Impersonal "it's like" with PRT like, pace the manual: fsh_109841.B.ag.xml:74 (S (NP-SBJ it) (VP 's (PRT like) (SBARQ-SEZ (EDITED (WHNP-UNF w-) +) (WHNP-1 what) (SQ (NP-SBJ-1 *T*) (VP happened (PP-CLR to (S-NOM (NP-SBJ *) (ADVP just) (VP (VP thinking (ADVP-MNR normally)) , and (VP saying , (S (NP-SBJ other countries) (VP do n't (VP let (S (NP-SBJ anybody) (VP come (ADVP-DIR in)))))))))))))) ?) *** ADJP (not WHADJP) around "how nice": fsh_111905.A.parse.ftags.ag.xml:11 (ADJP how nice .) *** PP-DIR-PRD for "been to X"" fsh_111905.A.ag.xml:19 (VP been (PP-DIR-PRD to (NP ireland)))) *** An example of a speaker trailing off: fsh_111905.A.ag.xml:20 (S (NP-SBJ they) (VP say , (S-SEZ (NP-SBJ (NP it)) (VP 's (ADJP-PRD really beautiful) , (PRN (S (NP-SBJ you) (VP know))) (NP-TPC (NP the country) , (PRN (S (NP-SBJ you) (VP know (SBAR (WHNP-1 what) (S (NP-SBJ i) (VP 'm (VP saying (NP-1 *T*)))))))) (PRN (S (NP-SBJ you) (VP know))) (EDITED (NP-UNF the w-) +) (PRN (S (NP-SBJ you) (VP know))) (NP (NP everything) (PP about (NP it))) , (PRN (S (NP-SBJ you) (VP know (SBAR (WHNP-2 what) (S (NP-SBJ i) (VP 'm (VP saying (NP-2 *T*)))))))) (NP the land))))) .) *** NAC with "but not": fsh_111905.A.parse.ftags.ag.xml:58 (S (CC and) (NP-SBJ (PRP i)) (VP (VBP know) (, ,) (SBAR (-NONE- *0*) (S (NP-SB J (PRP you)) (VP (VBP have) (NP (JJ big) (NNS stores)) (PP-LOC (IN up) (PP (IN in) (NP (NNP connecticut)))) (, ,) (NAC (CC but) (RB not) (SBAR-ADV (IN like) (S (NP- SBJ (PRP we)) (VP (VBP have) (ADVP-LOC (RB here) (PP (IN in) (NP (NNP new) (NNP york)))) (, ,) (NP-TPC (NP (NP (NNP macy) (POS 's))) (RB and) (PRN (S (NP-SBJ (PRP you)) (VP (VBP know)))) (NP (NNP bloomingdales))))))))))) (. .)) *** "It took X to Y": fsh_110347.A.parse.ftags.ag.xml:18 (S (NP-SBJ (PRP it)) (VP (VBD took) (NP-1 (NP (PRP$ my) (NN father) (POS 's)) (NN death)) (S-PRP (NP-SBJ-1 (-NONE- *)) (VP (TO to) (VP (VB wake) (NP (PRP her)) (PRT (RP up)) (, ,) (PRN (S (NP-SBJ (PRP i)) (VP (VBP guess)))))))) (. .)) *** Two analyses are possible here: that there is a coordinated subject with 'bad' verb agreement or that the 'me' should be edited out. Since 'me' is not in a delreg in the MDE annotation, the former is preferable. fsh_110347.A.parse.ftags.ag.xml:35 (S (NP-SBJ (NP me) and (NP my mother)) (EDITED (SBAR-TMP-UNF (WHADVP when)) +) (EDITED (X an-) +) (SBAR-TMP before (S (NP-SBJ she) (VP died))) (VP was (ADJP-PRD okay)) .) *** "because" can, in rare instances, be used as a discourse marker. fsh_110347.A.parse.ftags.ag.xml:63 (SBARQ (PRN (S (NP-SBJ you) (VP know))) (INTJ because) (EDITED (SBAR-UNF (WHADVP how)) +) (WHNP-1 what) (SQ does (NP-SBJ family) (VP mean (NP-1 *T*) (PP to (NP you)))) ?) *** "How about X" is done as follows: fsh_110347.B.parse.ftags.ag.xml:18 (FRAG (EDITED (WHADVP-UNF h-) +) (WHADVP how) (PP about (NP you)) ?) fsh_110347.B.parse.ftags.ag.xml:35 (FRAG (WHADVP how) (PP about (NP (NP kids) or (NP nieces or nephews))) ?) *** another example FRAG used to mark a string of questionable interpretability fsh_110347.B.parse.ftags.ag.xml:102 (FRAG and a (EDITED (S (NP they) (ADVP just) (VP-UNF have)) +) (INTJ like) (EDITED (EDITED (S (NP-SBJ we) (VP-UNF have)) +) (S (NP-SBJ we) (ADVP just) (VP-UNF have)) +) (NP-SBJ we) (ADVP kind of) (NP (NP an issue) (VP going)) .) *** fsh_112666.A:1 "there's some towns in california the minimum wage is uh ten or twelve dollars an hour, i think" assume (WHADVP 0) before minimum wage. "i think" as PRN *** Examples of in situ wh-elements; note that they are POS-tagged as wh-elements fsh_112666.A.ag.xml:3 (PRN (S (NP-SBJ you) (VP know (NP what)))) fsh_112666.B.ag.xml:64 (S (NP-SBJ i) (VP know (ADVP where))) *** these three EDITEDs are not all nested: fsh_112666.A.ag.xml:4 (S and (EDITED (S (NP-SBJ (NP the pay) (SBAR (WHNP-1 0) (S (NP-SBJ you) (VP get (NP-1 *T*))))) (VP-UNF ca n't)) +) (PRN (S (NP-SBJ you) (VP know))) (EDITED (S (EDITED (S (NP-SBJ it) (VP-UNF does-)) +) (NP-SBJ it) (VP does n't (VP-UNF meet))) +) (NP-SBJ it) (VP does n't (VP help (S (NP-SBJ ends) (VP meet))))) *** "it takes X to Y": fsh_112666.A.ag.xml:8 (VP take (NP (NP half an hour) (SBAR (WHADVP-1 0) (S (NP-SBJ *) (VP to (VP get (NP it) (ADVP-TMP-1 *T*))))))) *** Relative clauses that cannot reasonably be attached somewhere are put at VP level, with and -ADV tag fsh_112666.A.ag.xml:14 (SQ do (NP-SBJ-1 you) (VP want (S (NP-SBJ-1 *) (VP to (VP work (ADVP hard) (EDITED (PP-UNF as (X a-)) +) (PP in (NP nursing)) (SBAR-ADV (WHNP-2 who) (S (NP-SBJ-2 *T*) (VP makes (NP (NP nine dollars) (NP-ADV an hour))))))))) ?) *** All tokens in an INTJ node have the POS-tag UH fsh_112666.B.ag.xml:11 (INTJ my/PRP$ god/UH) *** ex. of RRC: fsh_112666.B.ag.xml:13 (S (NP-SBJ i) (VP made (NP nine dollars) (ADVP-LOC here) (EDITED (X w-) +) (S-MNR (NP-SBJ *) (VP doing (INTJ um) (EDITED nursing +) (NP (NP (NP something) + (RRC not (ADVP really) (NP-PRD nursing))) (PP (ADVP more) like (NP home health care))))))) *** Use of -TPC fsh_112666.B.ag.xml:27 (S (CC but) (INTJ (UH um)) (NP-TPC (NNP north) (NNP jersey)) (PP-LOC (IN at) (NP (NNP burger) (NNP king))) (PRN (S (EDITED (NP-SBJ-UNF (PRP i-)) (DISFL-IP +)) (NP-SBJ (PRP i)) (VP (VBP think) (SBAR (-NONE- *0*) (S (NP-SBJ (PRP it)) (VP (VBD was) (PP-LOC-PRD (-NONE- *?*)))))))) (NP-SBJ (PRP they)) (VP (VBP 're) (VP (VBG making) (NP (CD nine)))) (. .)) *** fsh_112666.B.ag.xml:28 "not that that is ..." (FRAG not (SBAR that (S (NP-SBJ that) (VP 's (NP-PRD minimum wage)))) .) *** "you too?" fsh_112666.B.ag.xml:55 (Paragraph (FRAG (NP you) (ADVP too) ?)) ***-LOC tag goes as high as possible fsh_112666.B.ag.xml:66 (PP-LOC (PP in (NP Cape May))) fsh_112666.B.ag.xml:54 (PP-LOC (ADVP here)) *** ex. of VB within a NML fsh_112045.B:56 (NP (NML cease/VB fire/NN) week) *** The -TTL tag is sufficient to mark the "nominal-ness" of titles, as in the following: fsh_112825.A.parse.ftags.ag.xml:63 (S but (INTJ uh) (NP-SBJ they) (VP have (S-TTL (NP-SBJ *) (VP nip and tuck))) .) *** "go (and) X" fsh_113405.B.parse.ftags.ag.xml:22 (VP go (S (NP-SBJ-1 *) (VP find (NP himself) (NP a different job))) , (SBAR-ADV unless (S (NP-SBJ the company) (VP wants (NP him))))) *** "what the heck" or "what the hell": fsh_113405.B.parse.ftags.ag.xml:27 (S so (NP-SBJ *) (VP figure , (WHNP-SEZ (WHNP what) (NP the heck))) .) *** We don't have a final EDITED in an unfinished utterance: Original: fsh_113405.B.parse.ftags.ag.xml:53 (S (NP-SBJ i) (VP did n't (VP get (EDITED (NP-UNF the u- +)) (EDITED (X i) +) (INTJ uh))) --) Correct: fsh_113405.B.parse.ftags.ag.xml:53 (S-UNF (EDITED (S (NP-SBJ i) (VP did n't (VP get (NP-UNF the) (X u-)))) +) (NP-SBJ i) + (INTJ uh) --) *** "you know what" can be filler and inside PRN. fsh_113405.B.parse.ftags.ag.xml:54 (INTJ (EDITED (INTJ-UNF n-) +) (PRN (S (NP-SBJ you) (VP know (NP what)))) no .) *** A potential use of a singleton *RNR* (as described in Switchboard manual) is eschwed in favor of a treatment with ellipsis: fsh_117496.B.parse.ftags.ag.xml:4 (NP (NP people) (ADVP (ADVP other) (PP than (NP (NP those) (SBAR (WHNP-1 that) (S (NP-SBJ i) (VP (VP know (NP-1 *T*) (ADVP well)) , and (INTJ uh) (PRN (S (NP-SBJ i) (VP guess (SBAR 0 (S (EDITED (S (NP-SBJ i) (VP-UNF 'm))) (NP-SBJ-2 i) (VP 'm (VP going (S (NP-SBJ-2 *) (VP to (VP say (S *?*))))))))))) (VP trust (NP-1 *T*)))))))))))) .) *** use of -TPC where we would like to use *EXP* fsh_118878.A.parse.ftags.ag.xml.tree.pretty.fmt.out: (0 TOP-41 (1 S (2 EDITED (3 EDITED (4 S (5 CC but) (7 NP-SBJ (8 PRP it)) (10 VP-UNF (11 BES 's))) (13 DISFL-IP +)) (15 S-UNF (16 CC but)) (18 DISFL-IP +)) (20 CC but) (22 INTJ (23 UH see)) (25 EDITED (26 S (27 NP-SBJ (28 DT that)) (30 VP (31 MD would) (33 VP-UNF (34 VB be)))) (36 DISFL-IP +)) (38 PP (39 IN in) (41 NP (42 PRP$ its) (44 NN sense))) (46 NP-SBJ (47 WDT that)) (49 VP (50 MD would) (52 VP (53 VB be) (55 ADJP-PRD (56 JJ wrong)) (58 S-TPC (59 NP-SBJ (60 -NONE- *)) (62 VP (63 TO to) (65 VP (66 VB think) (71 . .))) *** weird quasi-passive: fsh_118878.B.parse.ftags.ag.xml.tree.pretty.fmt.out (45 NP-SBJ (46 EX there)) (48 EDITED (49 VP (50 VBP are)) (52 DISFL-IP +)) (54 VP (55 VBD was) (57 EDITED (58 S-UNF-ADV (59 VBP set) (61 PRT (62 RP up))) (64 DISFL-IP +)) (66 VP (67 VBN set) (69 NP (70 DT some) (72 NNS rules)) (74 ADVP-TMP (75 NP (76 DT a) (78 JJ long) (80 NN time)) (82 RB ago))))))) *** ADVP surrounding 'not' in order to use gapping: fsh_119199.A.parse.ftags.ag.xml:24 (S and (NP-SBJ-1 i) (VP do (VP (VP (ADVP=2 n't) (VP want (S=3 (NP-SBJ-1 *) (VP to (VP talk))))) , (VP (ADVP=2 *NOT*) (ADVP just) (S=3 (NP-SBJ *) (VP (VP look (ADVP around)) , (EDITED (EDITED (CONJP and) +) (CONJP and) +) and (VP flap (NP my wings))))))) .) *** More gapping examples: fsh_117537.A.parse.ftags.ag.xml:70 (S (NP-SBJ i) (VP do n't (VP know (SBAR (WHNP-2 how much) (S (S (NP-SBJ they) (VP pay (NP=1 you) (NP-2 *T*))) , but (S (NP=1 something)))))) .) fsh_112825.A.parse.ftags.ag.xml:23 (S (INTJ (UH uh)) (NP-SBJ (DT that)) (VP (BES 's) (SBAR-PRP-PRD (IN because) (S (NP-SBJ (PRP they)) (VP (VP (VBP 're) (ADVP (RB probably)) (EDITED (VP-UNF (VBG seein-)) (DISFL-IP +)) (EDITED (RB not) (DISFL-IP +)) (ADVP=2 (RB not)) (NP-PRD=1 (NNS parents))) (, ,) (CC or) (VP (ADVP=2 (RB not)) (VP=1 (VBG seeing) (EDITED (EDITED (NP-UNF (DT the)) (DISFL-IP +)) (NP-UNF (DT the)) (DISFL-IP +)) (NP (DT the) (NN point)))))))) (. .)) *** "if anything" fsh_109487.A.parse.ftags.ag.xml:39 (S (NP-SBJ they) (VP have (NP on-campus jobs) (SBAR-ADV if (FRAG (NP anything)))) .) *** Ex. of a strange coordination: fsh_109487.A.parse.ftags.ag.xml:74 (S (CONJP but then) (NP-SBJ that) (VP 's (NP-PRD (NP me) and (NP (NP means) (PP of (S-NOM (NP-SBJ *) (ADVP actually) (VP talking (PP-CLR to (NP her)))))))) .) *** strange SBAR is attached at VP level in the following: fsh_109487.B.parse.ftags.ag.xml:9 (S (INTJ (UH so)) (EDITED (S (NP-SBJ (PRP i)) (VP (VBD looked))) (DISFL-IP +)) (NP-SBJ (DT this)) (VP (VBD was) (NP-PRD (NP (DT a) (EDITED (NN s-) (DISFL-IP +)) (NN subject)) (PP (IN for) (NP (PRP me)))) (SBAR-ADV (WHADVP-2 (WDT that)) (S (NP-SBJ (PRP i)) (VP (VBD thought) (, ,) (PP-SEZ (INTJ (UH oh)) (INTJ (UH boy)) (IN of) (NP (NP (DT all) (NNS subjects)) (SBAR (WHNP-1 (-NONE- *0*)) (S (VP (TO to) (VP (VB get) (NP-1 (-NONE- *T*)))))))) (, ,) (INTJ (UH right)) (ADVP-2 (-NONE- *T*)))))) (. ?)) *** "not sure how to..." fsh_109487.B.parse.ftags.ag.xml:18 (FRAG not (ADJP sure (EDITED (SBAR-MNR (WHADVP how +) (S (NP-SBJ *) (VP-UNF to))) +) (SBAR-MNR (WHADVP-1 how) (S (NP-SBJ *) (VP to (VP proceed (PP with (NP it)) (ADVP-MNR-1 *T*)))))) , (SBAR-PRP because (S (INTJ uh) (EDITED (S (NP-SBJ i) (VP know + (SBAR-UNF (WHADVP how)))) +) (NP-SBJ i) (VP know (SBAR (WHADJP-2 how important) (S (NP-SBJ fitness and exercise) (VP is (ADJP-PRD-2 *T*))))))) .) *** another ex. of use of -TPC fsh_109487.B.parse.ftags.ag.xml:66 (S (NP-TPC telemarketers) , (NP-SBJ (NP those (INTJ uh) type) (PP of (NP people))) (VP do n't (VP want (S (NP-SBJ *) (VP to (VP hear))))) .) *** Note use of S-TMP fsh_109487.B.parse.ftags.ag.xml:88 (S and (NP-SBJ we) (VP have (ADVP-TMP now) (VP been (ADJP-PRD married) (S-TMP (INTJ uh) (NP-SBJ it) (VP will (VP be (NP-PRD four years) (PP-TMP in (NP may))))))) .) *** Use of RRC: fsh_110103.A.parse.ftags.ag.xml:11 (Paragraph (FRAG (PRN (S (NP-SBJ you) (VP know))) (INTJ um) (PP (PP from (NP (NP preschool age) (RRC (ADVP sometimes) (ADJP younger)))) (INTJ ah) (ADVP on (ADVP up))) ,) (SBAR-PRP (INTJ um) because (S (NP-SBJ (NP we) (NP all)) (VP use (NP the computers))) .)) *** top-level NP: fsh_110103.A.parse.ftags.ag.xml:30 (NP (EDITED (NP (DT that)) (DISFL-IP +) (NP-UNF (DT tha-))) (JJ great) (NN point) (. .)) *** weird exclamation; treated as separate utterance fsh_110183.A.parse.ftags.ag.xml:7 (S but (ADVP-TMP now) (NP-SBJ (NP all) (SBAR (WHNP-1 0) (S (NP-SBJ i) (VP need (NP-1 *T*))))) (VP is (NP-PRD (NP (NP one tube) (PP of (NP blood))) (EDITED and (NP-UNF a) +) (INTJ um) (EDITED and +)( (INTJ um) and (NP this night machine) and (INTJ bloopsch))) .) *** The following sentences illustrate the handling of bad transcriptions: fsh_110183.A.parse.ftags.ag.xml:8 (S (PRN (S (NP-SBJ i) (VP mean))) (NP-SBJ all the analysis) (VP is (ADJP-PRD-UNF (ADVP just) (X whewhit))) .) "g-i": fsh_113312.B.parse.ftags.ag.xml:47 (S (INTJ well) (INTJ well) (INTJ well) (ADVP first (PP of (NP all))) (EDITED (EDITED (NP-SBJ-UNF i-) +) (EDITED (NP-SBJ-UNF i-) (X wi-) +) (X wi-) +) (NP-SBJ i) (EDITED (X g-i) +) (VP guess , (SBAR (S (NP they) (VP do n't (VP have (NP (NP any respect) (PP for (NP (NP people) (PP from (NP the united states))))) (NP period)))))) .) "to" for "too": fsh_109487.B.parse.ftags.ag.xml:16 (S (EDITED (S-UNF (NP-SBJ i)) +) (NP-SBJ this) (VP is (NP-PRD my first call) (ADVP to)) .) "your"/"you're" confusion: fsh_112666.A.ag.xml:10 (SBAR-ADV (EDITED (SBAR-ADV if (S (NP-SBJ-UNF your-))) +) if (S (NP-SBJ you) (VP 're (NP-PRD an employer)))) fsh_110346.A.parse.ftags.ag.xml:75 (SBAR-TMP (WHADVP-1 when) (S (PRN (S (NP-SBJ you) (VP know))) (NP-SBJ some guy) (VP is (EDITED (VP-UNF dating) +) (VP dating (EDITED (NP-UNF you 're little) +) (NP you/PRP 're/POS little sister) (ADVP-TMP-1 *T*))))) fsh_109487.B.parse.ftags.ag.xml:21 (S (INTJ so) (NP-SBJ i) (VP imagine , (S-SEZ (NP-SBJ your) (INTJ what))) ?) "their" for "they're": fsh_117936.A.parse.ftags.ag.xml:38 (S and (NP-SBJ their) (ADJP-PRD ready) .) *** Bad tokenization in the following; ignore the second period: fsh_113866.B.parse.ftags.ag.xml:47 (NP b. s . .) *** "no Xing" and "a Xing" fsh_113312.B.parse.ftags.ag.xml:38 (NP no (S-NOM (NP-SBJ *) (VP leaving (NP the u. s. a.) (PP without (S-NOM (NP-SBJ *) (VP being (ADJP-PRD able (S (NP-SBJ *) (VP to (VP come (ADVP-DIR back)))))))))) .) *** Use of -SEZ tag: fsh_113585.B.parse.ftags.ag.xml:6 (S (NP-SBJ there) (VP 's (EDITED (NP-PRD-UNF s-) +) (NP (NP something) (VP-1 *ICH*)) (EDITED (VP-UNF ca-) +) (PP-LOC in (NP l. a.)) (VP-1 called (S (NP-SBJ *) (S-SEZ-PRD (NP-SBJ *) (VP pay (S-PRP (NP-SBJ *) (VP to (VP play)))))))) .) *** Use of -TTL tag fsh_113585.B.parse.ftags.ag.xml:57 (S (EDITED and +) and (PRN (S (NP-SBJ you) (VP know))) (EDITED (X y-)) (EDITED (NP i)) (EDITED (X w-) +) (ADVP basically) (SBAR-ADV if (S (NP-SBJ we) (VP watched (NP enough (INTJ uh) (PP-TTL behind (NP the music)) specials (S (NP-SBJ *) (VP to (VP realize (SBAR (WHADVP how) (S (S (NP artists) (VP get (VP screwed (PRT over)))) , and (S (ADVP then) (EDITED (S (NP it) (VP 's (PP like))) +) (ADVP so now) (NP you) (VP 're (INTJ like) , (INTJ okay)))))))))))) *** Use NP as the default category for ellipses without expressed antecedents: fsh_113866.A.parse.ftags.ag.xml:29 (S (NP-SBJ i) (VP think , (SBAR 0 (S (PP to (NP a certain extent)) (NP-SBJ it) (VP is (NP *?*))))) .) Likewise, WHNPs whose antecedents are PP (as here, where it is "from Boston") are still treated as NPs: fsh_111993.A.parse.ftags.ag.xml:64 (SBAR (WHNP-1 which) (S (NP-SBJ i) (VP 'm not (NP-PRD-1 *T*))) .) *** ex. of NAC fsh_113866.B.parse.ftags.ag.xml:19 (S (NP-SBJ (PRP you)) (VP (VBP 're) (VP (VBG supplying) (NP (NNP uncle) (NN P sam)) (NAC (CC but) (ADVP (RB still)) (PP (IN in) (NP (DT all))) (, ,) (ADVP (DT no) (NN matter) (SBAR (WHNP-1 (WDT what) (NN money)) (S (NP-SBJ (PRP you)) (VP (VBP make) (NP-1 (-NONE- *T*))))))))) (. .)) *** Use of -SEZ tag to mark constituent as adverbial. (An ugly example at that.) fsh_119319.A.parse.ftags.ag.xml:68 (S (EDITED (ADJP pretty) +) (EDITED (S (NP-SBJ it) (VP 's (ADJP-PRD pretty))) +) (INTJ eh) (EDITED (ADJP pretty) +) (NP-SBJ (NP a lot) (PP of (NP them))) (VP are (NP-PRD (NP (NP a lot) (PP of (NP blacks))) (VP running (NP their mouths) , (INTJ-SEZ ah la la la la la la la)) (PP with (NP (NP the chat rooms) , (EDITED and +) and (NP that))))) .) *** "not for me to X" fsh_118878.B.parse.ftags.ag.xml:39 (S and (NP-SBJ they) (VP 're not (PP for (NP me)) (S-PRP (NP-SBJ *) (VP to (VP judge)))) .) *** ex. of "whatever" treated as INTJ fsh_119319.A.parse.ftags.ag.xml:11 (SBAR (INTJ whatever) (WHNP-1 who) (S (NP-SBJ we) (VP are (NP-1 *T*))) .) *** PP interloping at SBAR level fsh_111956.B.parse.ftags.ag.xml:41 (S but (EDITED (NP-SBJ i) +) (NP-SBJ i) (ADVP-TMP always) (VP imagined (SBAR (PP-LOC-TPC in (NP the city)) , that (S (EDITED (S-UNF (NP-SBJ people) (ADVP just)) +) (EDITED (NP-TPC-UNF a) +) (NP-TPC (NP a lot) (PP of (NP the stuff))) , (NP-SBJ they) (VP 'll (ADVP just) (VP deliver (NP it) (PP-LOC *T*)))))) .) *** "quite as big a X as" fsh_111993.B.parse.ftags.ag.xml:8 (S (EDITED (NP-SBJ (PRP it)) (DISFL-IP +)) (NP-SBJ-1 (PRP it)) (VP (VBZ doe s) (RB n't) (VP (VB seem) (S (NP-SBJ-1 (-NONE- *)) (VP (TO to) (VP (VB be) (EDITED (NP-PRD-UNF (RB quite) (RB as)) (DISFL-IP +)) (NP-PRD (NP (ADJP (RB quite) (RB as) (JJ big)) (DT a) (NN holiday)) (SBAR (EDITED (SBAR (RB as) (S-UNF (NP-S BJ (PRP it)))) (DISFL-IP +)) (IN as) (S (NP-SBJ (PRP it)) (VP (VBD was) (NP-PRD (-NONE- *?*)) (INTJ (UH uh)) (ADVP-TMP (NP (NNS years)) (RB ago))))))))))) (. .))) *** "what have you" fsh_111993.B.parse.ftags.ag.xml:41 (SBAR-ETC (WHNP-2 what) (SINV have (NP-SBJ you) (NP-2 *T*)))))))))) *** Another PRT "like" construction fsh_111993.B.parse.ftags.ag.xml:70 (S (NP-SBJ everybody) (VP looks (PP at (NP you)) (PRT like) , (S-SEZ-IMP (INTJ whoa) (NP-SBJ *) (VP wait (NP a minute)))) .) *** treament of "p.s."; note horrible tokenization fsh_112048.A.parse.ftags.ag.xml:67 (ADVP p./FW s/FW ./. .) *** "I'd rather X" with a comparative: fsh_112048.A.parse.ftags.ag.xml:74 (S (NP-SBJ i) (VP 'd (ADVP-CLR rather) (SBAR 0 (S (NP-SBJ-2 he) (VP (VP would (VP make (NP me) (NP something))) , or (VP write (NP (NP a card)))))) , (SBAR than (S (NP-SBJ-2 *) (VP get (S (NP-SBJ-2 *) (VP to (VP (VP buy (NP it)) , or (EDITED (VP pick (NP flowers) (PP out (PP of (NP-UNF the)))) +) (PRN (S (NP-SBJ you) (VP know))) (NP-ETC (NP anything) (PP like (NP that))))))))))) *** Use of -UNF without an EDITED node: fsh_112048.A.parse.ftags.ag.xml:88 (SBARQ (INTJ well) (WHNP what) (SQ-UNF do (NP-SBJ you) (EDITED (PP-UNF during) +) (INTJ um) (PP-TMP during (NP the day))) ?) *** SINV within an SBAR: fsh_112048.B.parse.ftags.ag.xml:54 (S-IMP and (NP-SBJ *) (VP tell ... (SBAR (WHNP-2 what) (SINV do (NP-SBJ you) (VP like (PP about (NP it)) (NP-2 *T*)))))) .) *** Two exx. of ADVPs that feel very PP-ish fsh_112707.A.parse.ftags.ag.xml:20 (S (PRN (S (NP-SBJ you) (VP know))) (NP-SBJ it) (VP 's (ADJP-PRD unheard (ADVP of))) .) fsh_112707.A.parse.ftags.ag.xml:53 (S (NP-SBJ the survey) (VP was (ADVP-PRD over with))) *** "and not" in a CONJP fsh_117716.B.parse.ftags.ag.xml:17 (S (INTJ ah) (ADVP actually) (NP-SBJ it) (VP 's (NP-PRD (NP permission) (EDITED and not +) (CONJP and not) (NP permission)) (PP for (NP each rating))) .) *** "Here he is..." fsh_117936.A.parse.ftags.ag.xml:39 (S and (ADVP here) (NP-SBJ-1 he) (VP is (ADVP-TMP still) (VP want (S (NP-SBJ-1 *) (VP to (PRN (S (NP-SBJ you) (VP know))) (VP goof (PRT off)))))) .) "There you go" fsh_111993.B.parse.ftags.ag.xml:51 (S (ADVP there) (NP-SBJ you) (VP go) .) fsh_110843.A.parse.ftags.ag.xml:53 (S (NP-SBJ here) (VP 's (NP-PRD the key)) .) here/RB *** Long renaming enclosed in PRN fsh_110843.A.parse.ftags.ag.xml:36 (S (NP-SBJ (PRP she)) (VP (VBZ lives) (PP-LOC (IN in) (NP (NNP houston))) (, ,) (EDITED (PP-UNF (IN in)) (DISFL-IP +)) (INTJ (UH uh)) (PP-LOC (IN in) (NP (NP (DT an) (NN area)) (PRN (EDITED (S (NP-SBJ (PRP i)) (VP (VBP think) (SBAR (-NONE- *0*) (S-UNF (NP-SBJ (PRP you)) (X (XX s-)))))) (DISFL-IP +)) (S (NP (PRP i)) (VP (VBP think) (SBAR (-NONE- *0*) (S (NP-SBJ-1 (PRP it)) (VP (BES 's) (VP (VBN called) (S (NP-SBJ-1 (-NONE- *)) (NP-PRD (NNP friendswood))))))))))))) (. .)) *** WH- traces into Coordinated structures Only one trace need be inserted when it is adverbial: fsh_119379.A.parse.ftags.ag.xml:26 (SBAR (WHADVP-2 where) (S (NP-SBJ-1 you) (VP (VP (VP throw (NP the ball)) , and (VP try (S (NP-SBJ-1 *) (VP to (VP break (NP the bottles)))))) , or (VP (VP throw (NP the dart)) , and (VP break (NP the balloon))) (WHADVP-2 *T*))) .) *** Appositive to ADVP (S (NP-SBJ they) (VP are (ADVP-LOC-PRD here (PP on (NP the West Coast))))) CHAPTER 12, sections 1-4, COPIED FROM THE BIOMEDICAL ADDENDUM TO THE TREEBANK II GUIDELINES, AVAILABLE AT http://bioie.ldc.upenn.edu. Other chapters in the Biomedical addendum may be of interest as well, but Chapter 12 copied here includes the primary change in policy that affects the speech domain. 12 Addendum for other current non-biomedical treebank projects Changes in Treebank Policy for the English-Chinese Treebank (ECTB) and English-Arabic Treebank (EATB) are the same as those for the biomedical treebank for the most part. Differences between the treebank annotation guidelines for these projects and those presented above for the biomedical project are below. 12.1 Introduction In addition to the annotation changes instantiated in BioMedical Treebank, several other changes have been applied to the Treebank annotation for the ECTB and EATB projects. Some of the more sweeping changes have been adopted (such as the category NML and the more liberal use of "pseudo-passives"), while others have not (such as the placeholder *P*). In addition, there have been points of departure between the projects in regards to individual items (e.g., "compared with"). This sequel highlights the most consequential changes in policy, some of which are differences with the BioMed project, and others are reiterations of the policy established there. The changes outlined in the sections preceding this one are, in general, to be assumed to apply to the ECTB and EATB corpora. Note that many of changes implemented for the BioMed project simply do not occur in the ECTB and EATB due to the (fairly radical) differences between the corpora both in subject matter and literary style. 12.2 Use of NML In most instances, the use of the NML tag follows the guidelines prescribed in the preceding sections. For example, (NP the (ADJP (NML Hong Kong) - based) company) (NP-SBJ (NP The number) (PP of (NP (UCP (NML Chinese-foreign joint ownership) and (ADJP cooperative)) construction enterprises))) As outlined in 1.1.3 above, in instances where it is difficult to determine the scope of prenominal elements, the default treatment is to group as much as can reasonably be determined and leave the remainder flat, (NP foreign investments and donations) (NP a foreign enterprise project contracting capital and quality license) In some instances, NML is used simply to mark a constituent as a pre-head nominal modifier, regardless of its syntactic category. (NP the (NML (S (NP-SBJ (NP a - man 's) - life) (VP - is (ADJP-PRD - hard)))) generation) In other instances, it is used to set off a non-nominal constituent that is functioning as an NP head. Note that this and the above usage do not occur in the BioMed corpus. (NP these three (NML (S (NP-SBJ *) (VP do not (VP fears)))) 12.2.1 The use of NML and shared material inside NP 12.2.1.1 Nominal Subconstituents NML is used to mark nominal subconstituents that do not follow our assumed right-branching default structure: (NP the (NML Hong Kong) economy) (NP (NML high level) economic talks) (NP the (ADJP (NML New York) - based) company) 12.2.1.2 Coordinated Premodifiers Coordinated premodifiers form a constituent node, typically ADJP, UCP or NML. Following standard policy for coordination, the individual coordinated elements only receive syntactic nodes if one or more of them is multi-token. (NP this (NML (NML large scale) and (NML high level)) international convention) (NP (UCP (JJ domestic) and (RB overseas)) markets) (NP (UCP (ADJP scientific) and (ADJP technological) and (NML software development)) companies) (NP the (NML (NML Red Cross) and (NML Red Crescent)) movement) (NP his (ADJP energetic and powerful) performance) 12.2.1.3 Coordinated heads with shared premodifiers In a coordinated NP, any modifiers that are left flat are assumed to be shared across all the heads: (NP China macroscopic economic readjustment and control) (NP (NP China's) international income and expenses) Unshared modifiers in a coordinated structure must form a constituent node with the head they are modifying, as in standard NP coordination: (NP (NP material civilization) and (NP spiritual culture)) When unshared and shared modifiers are combined, the above structure is preserved (using the NML node label) for the unshared components. That is, each of the coordinated elements is marked as a constituent and the elements together form a constituent. Any unshared modifiers are left as sisters to the coordinated structure: (NP socialist (NML (NML material civilization) and (NML spiritual culture))) (NP (NML New Zealand) (NML (NML industries) and (NML business circles))) 12.3 Non-use of *P* The placeholder *P* (as introduced in 1.2 above) is not used in the ECTB and EATB corpora. Rather, the old Penn Treebank guidelines are followed as much possible. As a result, NML is used rather less in these corpora. 12.4 Changes in Old Tokenization Policy Certain changes in the old POS policy regarding the tokenization of hyphenated items have resulted in minor changes to treebanking policy. For instance, many adjectives which were previously a single-token under the previous guidelines have been separated into multiple tokens, thus requiring an ADJP node dominating them. For example, (NP (ADJP so - called) false cypresses) (NP a (ADJP well - known) fact) (NP (ADJP visa - free) travel) Similarly, these changes have created many circumstances requiring the use of NML, e.g. (NP a (ADJP (NML Hong Kong) - based) company) (NP (NML Taiwan - Palau) trade) (NP (NML (NML Hong Kong) - (NML Palau)) trade)