Chinese CogBank

Item Name: Chinese CogBank
Author(s): Bin Li, Siqi Yin, Jie Xu, Li Song, Minxuan Feng
LDC Catalog No.: LDC2020T01
ISBN: 1-58563-917-6
ISLRN: 382-367-821-870-2
DOI: https://doi.org/10.35111/w8tv-1e21
Release Date: February 17, 2020
Member Year(s): 2020
DCMI Type(s): Text
Data Source(s): web collection
Application(s): semantic role labelling
Language(s): Mandarin Chinese
Language ID(s): cmn
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2020T01 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Li, Bin, et al. Chinese CogBank LDC2020T01. Web Download. Philadelphia: Linguistic Data Consortium, 2020.

Introduction

Chinese CogBank is a database of cognitive properties of Chinese words intended for use in metaphor understanding and generation. It consists of 232,497 "word-property" pairs, which are comprised of 83,104 words and 100,195 properties. Each "word-property" type also has an associated frequency which can stand as a functional measure of the importance of a property.

Data

The data was collected via the Chinese search engine Baidu.com. The original collection consisted of 1,258,430 types (5,637,500 tokens) of "word-adjective" pairs that were reduced in Chinese CogBank to 232,497 "word-property" pairs after a series of manual checks.

The corpus is presented as a single tab separated value file encoded in UTF-8.

Samples

Please view this sample.

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee