Coordination Annotation for the Penn Treebank

Author(s): Sandra Kübler, Wolfgang Maier, Erhard Hinrichs
LDC Catalog No.: LDC2015T08
ISBN: 1-58563-714-9
ISLRN: 060-785-139-403-2
Release Date: May 15, 2015
Member Year(s): 2015
DCMI Type(s): Text
Data Source(s): newswire
Application(s): parsing, question-answering, machine translation
Language(s): English
Language ID(s): eng
Citation: Kübler, Sandra, Wolfgang Maier, and Erhard Hinrichs. Coordination Annotation for the Penn Treebank LDC2015T08. . Philadelphia: Linguistic Data Consortium, 2015.
Coordination Annotation for the Penn Treebank is a stand-off annotation for the Wall Street Journal portion of Treebank-3 (PTB3) (LDC99T42) developed by researchers at the University of Düsseldorf and Indiana University. It marks all tokens that have a coordinating function (potentially among other functions).

Coordination is a syntactic structure that links together two or more elements known as conjuncts or conjoins. The presence of coordination is often signaled by the appearance of a coordinator (coordinating conjunction), such as and, or, but in English.

Penn Coordination Annotation is available at no cost to all licensees of PTB3 and appears in their download queue associated with LDC99T42 as penn_coordination_anno_LDC2015T08.tgz.


This annotation is presented in a single UTF-8 plain text tsv file with columns as follows:

  • section: Penn Treebank WSJ section number
  • file: Number of file within section
  • sentence: Number of sentence (starting with 0)
  • token: Number of token (starting with 0)
  • annotation: "P" if the token is a coordinating punctuation, "O" otherwise


