Penn Discourse Treebank Version 2.0 - German Translation

Item Name: Penn Discourse Treebank Version 2.0 - German Translation
Author(s): Henny Sluyter-Gaethje, Peter Bourgonje, Manfred Stede
LDC Catalog No.: LDC2021T05
ISBN: 1-58563-955-9
ISLRN: 142-519-062-218-1
DOI: https://doi.org/10.35111/x7qb-7h47
Release Date: February 15, 2021
Member Year(s): 2021
DCMI Type(s): Text
Data Source(s): newswire
Application(s): discourse parsing
Language(s): German
Language ID(s): deu
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2021T05 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Sluyter-Gaethje, Henny, Peter Bourgonje, and Manfred Stede. Penn Discourse Treebank Version 2.0 - German Translation LDC2021T05. Web Download. Philadelphia: Linguistic Data Consortium, 2021.
Related Works: View

Introduction

Penn Discourse Treebank Version 2.0 - German Translation was developed at the University of Potsdam's Applied Computational Linguistics group and consists of approximately one million tokens derived from Penn Discourse Treebank Version 2.0 (LDC2008T05). This data was translated into German and annotated for shallow discourse relations in the financial news domain.

The aim of the Penn Discourse Treebank (PDTB) project is to annotate the Wall Street Journal text in Treebank-2 with discourse relations. PDTB2-German is based on a subset of PDTB2.0 used in the 2016 CoNLL Shared Task on Multilingual Shallow Discourse Parsing.

Data

Data is in CoNLL format. Text was automatically translated into German with deepL, and projections of the annotations using word alignments were produced with GIZA++. See the included documentation for more information on the relation annotations.

Source text and CoNLL format annotations are each presented in their own tab separated plain text file, encoded in UTF-8.

Samples

Please view this source sample (TXT) and annotation sample (TXT).

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee