Chinese Sentence Pattern Structure Treebank
Item Name: | Chinese Sentence Pattern Structure Treebank |
Author(s): | Weiming Peng, Min Zhao, Jing He, Yuchen Song, Tianbao Song, Dongdong Guo, Jingbo Sun, Shuqin Zhu, Yinbin Zhang, Zuntian Wei, Jiajia Hu, Jihua Song, Zhifang Sui, Ning Wang |
LDC Catalog No.: | LDC2025T06 |
ISLRN: | 916-484-709-412-8 |
DOI: | https://doi.org/10.35111/hx6v-6p30 |
Release Date: | June 16, 2025 |
Member Year(s): | 2025 |
DCMI Type(s): | Text |
Data Source(s): | essays, fiction, non-fiction |
Application(s): | historical linguistics, information extraction, linguistic analysis, natural language processing, syntactic parsing |
Language(s): | Mandarin Chinese, Chinese |
Language ID(s): | cmn, zho |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2025T06 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Peng, Weiming, et al. Chinese Sentence Pattern Structure Treebank LDC2025T06. Web Download. Philadelphia: Linguistic Data Consortium, 2025. |
Related Works: | View |
Introduction
Chinese Sentence Pattern Structure Treebank (the SPS Treebank) was developed at Beijing Normal University and Peking University. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis which emphasizes sentence pattern structure. This concept is based on linguist Jinxi Li's The New Chinese Grammar. The source data consists of 27 chapters extracted from modern Mandarin and ancient Chinese works.
Data
The SPS Treebank has three annotation layers: lexical sense and structural mode for dynamic words; syntactic structure for clauses; and inter-clause relation within complex sentence and sentence clusters. These structures can be visualized using the Jbw-viewer tool.
Below are the text data sources and volumes contained in this release:
Book Name | Chapters | Characters | Sentences |
---|---|---|---|
Selected Work of Luxun (《鲁迅全集》) | 8 | 25,545 | 948 |
Selected Work of Mao Zedong (《毛泽东选集》) | 2 | 32,454 | 771 |
From the Soil: The Foundations of Chinese Society (《乡土中国》) | 4 | 16,018 | 532 |
A Dream in Red Mansions (《红楼梦》) | 5 | 33,087 | 1,781 |
The Analects of Confucius (《论语》) | 6 | 5,392 | 517 |
Mencius (《孟子》) | 2 | 6,771 | 467 |
Total: | 27 | 119,267 | 5,016 |
The data is presented in UTF-8 encoding. Each file contains the three-layer annotation stored in XML format. All files were automatically verified and manually checked.
Samples
Please view the following samples:
Updates
None at this time..