Chinese CogBank
| Item Name: | Chinese CogBank |
| Author(s): | Bin Li, Siqi Yin, Jie Xu, Li Song, Minxuan Feng |
| LDC Catalog No.: | LDC2020T01 |
| ISBN: | 1-58563-917-6 |
| ISLRN: | 382-367-821-870-2 |
| DOI: | https://doi.org/10.35111/w8tv-1e21 |
| Release Date: | February 17, 2020 |
| Member Year(s): | 2020 |
| DCMI Type(s): | Text |
| Data Source(s): | web collection |
| Application(s): | semantic role labelling |
| Language(s): | Mandarin Chinese |
| Language ID(s): | cmn |
| License(s): |
LDC User Agreement for Non-Members |
| Online Documentation: | LDC2020T01 Documents |
| Licensing Instructions: | Subscription & Standard Members, and Non-Members |
| Citation: | Li, Bin, et al. Chinese CogBank LDC2020T01. Web Download. Philadelphia: Linguistic Data Consortium, 2020. |
Introduction
Chinese CogBank is a database of cognitive properties of Chinese words intended for use in metaphor understanding and generation. It consists of 232,497 "word-property" pairs, which are comprised of 83,104 words and 100,195 properties. Each "word-property" type also has an associated frequency which can stand as a functional measure of the importance of a property.
Data
The data was collected via the Chinese search engine Baidu.com. The original collection consisted of 1,258,430 types (5,637,500 tokens) of "word-adjective" pairs that were reduced in Chinese CogBank to 232,497 "word-property" pairs after a series of manual checks.
The corpus is presented as a single tab separated value file encoded in UTF-8.
Samples
Please view this sample.
Updates
None at this time.