Chinese CogBank
Item Name: | Chinese CogBank |
Author(s): | Bin Li, Siqi Yin, Jie Xu, Li Song, Minxuan Feng |
LDC Catalog No.: | LDC2020T01 |
ISBN: | 1-58563-917-6 |
ISLRN: | 382-367-821-870-2 |
DOI: | https://doi.org/10.35111/w8tv-1e21 |
Release Date: | February 17, 2020 |
Member Year(s): | 2020 |
DCMI Type(s): | Text |
Data Source(s): | web collection |
Application(s): | semantic role labelling |
Language(s): | Mandarin Chinese |
Language ID(s): | cmn |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2020T01 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Li, Bin, et al. Chinese CogBank LDC2020T01. Web Download. Philadelphia: Linguistic Data Consortium, 2020. |
Introduction
Chinese CogBank is a database of cognitive properties of Chinese words intended for use in metaphor understanding and generation. It consists of 232,497 "word-property" pairs, which are comprised of 83,104 words and 100,195 properties. Each "word-property" type also has an associated frequency which can stand as a functional measure of the importance of a property.
Data
The data was collected via the Chinese search engine Baidu.com. The original collection consisted of 1,258,430 types (5,637,500 tokens) of "word-adjective" pairs that were reduced in Chinese CogBank to 232,497 "word-property" pairs after a series of manual checks.
The corpus is presented as a single tab separated value file encoded in UTF-8.
Samples
Please view this sample.
Updates
None at this time.