Coordination Annotation for the Penn Treebank

Item Name: Coordination Annotation for the Penn Treebank
Author(s): Sandra Kübler, Wolfgang Maier, Erhard Hinrichs
LDC Catalog No.: LDC2015T08
ISBN: 1-58563-714-9
ISLRN: 060-785-139-403-2
DOI: https://doi.org/10.35111/ekgv-et49
Release Date: May 15, 2015
Member Year(s): 2015
DCMI Type(s): Text
Data Source(s): newswire
Application(s): parsing, question-answering, machine translation
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2015T08 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Kübler, Sandra, Wolfgang Maier, and Erhard Hinrichs. Coordination Annotation for the Penn Treebank LDC2015T08. . Philadelphia: Linguistic Data Consortium, 2015.
Related Works: View

Introduction

Coordination Annotation for the Penn Treebank is a stand-off annotation for the Wall Street Journal portion of Treebank-3 (PTB3) (LDC99T42) developed by researchers at the University of Düsseldorf and Indiana University. It marks all tokens that have a coordinating function (potentially among other functions).

Coordination is a syntactic structure that links together two or more elements known as conjuncts or conjoins. The presence of coordination is often signaled by the appearance of a coordinator (coordinating conjunction), such as and, or, but in English.

Penn Coordination Annotation is available at no cost to all licensees of PTB3 and appears in their download queue associated with LDC99T42 as penn_coordination_anno_LDC2015T08.tgz.

Data

This annotation is presented in a single UTF-8 plain text tsv file with columns as follows:

  • section: Penn Treebank WSJ section number
  • file: Number of file within section
  • sentence: Number of sentence (starting with 0)
  • token: Number of token (starting with 0)
  • annotation: "P" if the token is a coordinating punctuation, "O" otherwise

Samples

Please view this sample.

Updates

None at this time.