SemTransCNC
Item Name: | SemTransCNC |
Author(s): | Shichang Wang, Chu-Ren Huang, Yao Yao, Angel Chan |
LDC Catalog No.: | LDC2020T12 |
ISBN: | 1-58563-931-1 |
ISLRN: | 835-247-023-332-5 |
DOI: | https://doi.org/10.35111/vreb-7n07 |
Release Date: | June 22, 2020 |
Member Year(s): | 2020 |
DCMI Type(s): | Text |
Data Source(s): | web collection, newswire, essays, journal articles, non-fiction, fiction, microphone speech, journal entries, meeting speech, microphone conversation, correspondence, transcribed speech, dictionaries |
Application(s): | semantic role labelling |
Language(s): | Mandarin Chinese |
Language ID(s): | cmn |
License(s): |
SemTransCNC Agreement |
Online Documentation: | LDC2020T12 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Wang, Shichang, et al. SemTransCNC LDC2020T12. Web Download. Philadelphia: Linguistic Data Consortium, 2020. |
Related Works: | View |
Introduction
SemTransCNC was developed by The Hong Kong Polytechnic University. It is comprised of a semantic transparency dataset of Chinese nominal compounds built using a series of crowd-based experiments.
Nominal compounds were selected from the Sinica Corpus and a modern Chinese lexicon. Crowd workers answered questionnaires that included demographic information and questions about the Chinese language. For assessing overall semantic transparency (OST) of selected compounds, they answered the question: "How is the sum of the meanings of A and B similar to the meaning of AB?" For assessing constituent semantic transparency (CST), they were asked to describe the similarity of A alone to its meaning in AB and the meaning of B alone to its meaning in AB.
Data
SemTransCNC consists of OST and CST data for 1,176 dimorphemic Chinese nominal compounds, which consist of free morphemes and have mid-range frequencies.
The text data is presented as a UTF-8 encoded comma separated text file.
Samples
Please view this text sample (CSV).
Updates
None at this time.