Coordination Annotation for the Penn Treebank
|Item Name:||Coordination Annotation for the Penn Treebank|
|Author(s):||Sandra Kübler, Wolfgang Maier, Erhard Hinrichs|
|LDC Catalog No.:||LDC2015T08|
|Release Date:||May 15, 2015|
|Application(s):||parsing, question-answering, machine translation|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2015T08 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Kübler, Sandra, Wolfgang Maier, and Erhard Hinrichs. Coordination Annotation for the Penn Treebank LDC2015T08. . Philadelphia: Linguistic Data Consortium, 2015.|
Coordination Annotation for the Penn Treebank is a stand-off annotation for the Wall Street Journal portion of Treebank-3 (PTB3) (LDC99T42) developed by researchers at the University of Düsseldorf and Indiana University. It marks all tokens that have a coordinating function (potentially among other functions).
Coordination is a syntactic structure that links together two or more elements known as conjuncts or conjoins. The presence of coordination is often signaled by the appearance of a coordinator (coordinating conjunction), such as and, or, but in English.
Penn Coordination Annotation is available at no cost to all licensees of PTB3 and appears in their download queue associated with LDC99T42 as penn_coordination_anno_LDC2015T08.tgz.
This annotation is presented in a single UTF-8 plain text tsv file with columns as follows:
- section: Penn Treebank WSJ section number
- file: Number of file within section
- sentence: Number of sentence (starting with 0)
- token: Number of token (starting with 0)
- annotation: "P" if the token is a coordinating punctuation, "O" otherwise
Please view this sample.
None at this time.