The RST Continuity Corpus Debopam Das & Markus Egg The RST Continuity Corpus (RST-CC) is a corpus of discourse relations annotated for seven continuity dimensions. The RST-CC contains 1,009 relations from 217 texts in the RST Discourse Treebank (RST-DT), and those relations represent five major relation types: causal, contrastive, conditional, elaboration, and temporal. The relations are annotated for the seven continuity dimensions: time, space, reference, action, perspective, modality, and speech act. The relations are furthermore annotated for additional features: polarity, order of segments, nuclearity, and context. For more information about the corpus, see Das & Egg (2023): https://aclanthology.org/2023.law-1.16/. The annotation in the RST-CC was performed and can be accessible using the 2.8.16 version (or later versions) of UAM CorpusTool (http://www.corpustool.com/). =============================================================== A description of the directories, sub-directories and data follows: The root directory, RST-CC, includes four sub-directories: (1) Analyses, (2) Corpus, (3) Results and (4) Schemes, and a UAM CorpusTool project file named 'Annotation.ctpr' which can be used to open, view, and edit the annotations of the discourse relations. (1) Analyses: This directory includes a sub-directory, (1.1) All_Files, which further includes 217 sub-directories containing the annotation for the continuity dimensions (and additional features). Each of these sub-directories begins with the name .txt, in which represents the number of the source text in the RST-DT for which the continuity (and additional feature) annotation is provided. A .txt directory includes fives files, (1.1.1) Metadata.xml, (1.1.2) Relation.xml, (1.1.3) Identified.xml, (1.1.4) Continuity.xml, and (1.1.5) Additional-parameters.xml (1.1.1) Metadata.xml: This includes information about the metadata of the annotation (language, encoding format, font type, and font size). (1.1.2) Relation.xml: This includes information about the target discourse relations (types and sub-types), a replication of the RST-DT relation annotations. (1.1.3) Identified.xml: This includes information about whether a target discourse relation is adjusted for the nuclearity status of its discourse segments (for more information, see the RST-CC annotation manual). (1.1.4) Continuity.xml: This includes the annotation of the target discourse relations for the seven continuity dimensions. (1.1.5) Additional-parameters.xml: This includes the annotation of the target discourse relations for the four additional features. (1.1) All_Files directory also includes two additional files, METADATA.dtd and document.dtd, which are used to validate the fives XML files in each .txt directory. (2) Corpus: This directory contains the source corpus for the continuity (and additional feature) annotation. This directory includes a sub-directory, (2.1) All_Files, which further includes 217 .txt files, each with the name .txt. (3) Results: This directory is empty, but it can be used to store different search results and statistics for the RST-CC produced by UAM CorpusTool. (4) Schemes: This directory includes seven files, (4.1) ACRuleList.xml, (4.2) Relation.xml, (4.3) Identified.xml, (4.4) Continuity.xml, (4.5) Additional-parameters.xml, (4.6) Network.dtd and (4.6) rules.dtd. (4.1) ACRuleList: This XML file is automatically produced, and contains only some meta-information about the annotation. (4.2) Relation.xml: This XML file contains the relation annotation scheme used in the RST-CC (replicated from the RST-DT). (4.3) Identified.xml: This XML file contains the annotation scheme for updating the nuclearity status of a target discourse relation (for more information, see the RST-CC annotation manual). (4.4) Continuity.xml: This XML file contains the annotation scheme for the continuity annotation. (4.5) Additional-parameters.xml: This XML file contains the annotation scheme for the additional feature annotation. The other two files, (4.6) Network.dtd and (4.7) rules.dtd, are used to validate the five XML files in the (4) Schemes directory.