RST Continuity Corpus

Item Name: RST Continuity Corpus
Author(s): Debopam Das, Markus Egg
LDC Catalog No.: LDC2024T08
ISLRN: 183-361-437-399-8
DOI: https://doi.org/10.35111/jfbf-gn90
Release Date: October 15, 2024
Member Year(s): 2024
DCMI Type(s): Text
Data Source(s): newswire
Application(s): discourse analysis
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2024T08 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Das, Debopam, and Markus Egg. RST Continuity Corpus LDC2024T08. Web Download. Philadelphia: Linguistic Data Consortium, 2024.
Related Works: View

Introduction

RST Continuity Corpus was developed at Åbo Akademi University and Humboldt-Universität zu Berlin and contains annotations for continuity dimensions added to RST Discourse Treebank (LDC2002T07). RST Discourse Treebank is a collection of English news texts from the Penn Treebank annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Continuity Corpus, the relations are annotated for the seven continuity dimensions: time, space, reference, action, perspective, modality, and speech act. The relations are also annotated for polarity, order of segments, nuclearity, and context.

Data

The source data consists of 1,009 relations from 217 Wall Street Journal texts annotated in RST Discourse Treebank for five relation types: causal, contrastive, conditional, elaboration and temporal.

Annotation was performed using the UAM CorpusTool, version 2.8.16 or later.

Files are presented as UTF-8 encoded XML and plain text. The corpus is divided into four sub-directories as described in the README file.

Samples

Please view the following samples:

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee