RST Signalling Corpus
|Item Name:||RST Signalling Corpus|
|Author(s):||Debopam Das, Maite Taboada, Paul McFetridge|
|LDC Catalog No.:||LDC2015T10|
|Release Date:||June 15, 2015|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2015T10 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Das, Debopam, Maite Taboada, and Paul McFetridge. RST Signalling Corpus LDC2015T10. Web Download. Philadelphia: Linguistic Data Consortium, 2015.|
RST Signalling Corpus was developed at Simon Fraser University and contains annotations for signalling information added to RST Discourse Treebank (LDC2002T07). RST Discourse Treebank (RST-DT) is a collection of English news texts annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Signalling Corpus, information about textual signals -- such as although, because, thus -- and signals such as tense, lexical chains or punctuation were added as an annotation layer to examine how rhetorical relations are signalled in discourse.
The source data consists of 385 Wall Street Journal news articles from the Penn Treebank annotated for rhetorical relations in RST Discourse Treebank. As in RST-DT, the data in this release is divided into a training set (347 articles) and a test set (38 articles).
The signalling annotation in this data set was performed using the UAM CorpusTool version 2.8.12. Files are presented as UTF-8 encoded XML and plain text. The corpus is divided into three annotation sub-directories: training, test and full. All sub-directories include source, metadata, signalling annotation, and dtd files.
Please view the following samples:
None at this time.