CALLHOME Mandarin Chinese Lexicon
Item Name: | CALLHOME Mandarin Chinese Lexicon |
Author(s): | Shudong Huang, Xuejun Bian, Grace Wu, Cynthia McLemore |
LDC Catalog No.: | LDC96L15 |
ISBN: | 1-58563-079-9 |
ISLRN: | 969-490-893-990-1 |
DOI: | https://doi.org/10.35111/ysmr-h820 |
Member Year(s): | 1996, 1997 |
DCMI Type(s): | Text |
Data Source(s): | telephone conversations |
Project(s): | EARS, Hub5-LVCSR, GALE |
Application(s): | speech recognition |
Language(s): | Mandarin Chinese |
Language ID(s): | cmn |
License(s): |
CALLHOME Lexicon Agreement (Commercial) CALLHOME Lexicon Agreement (Non-Commercial) CALLHOME Lexicon Agreement (Non-Member) |
Online Documentation: | LDC96L15 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Huang, Shudong, et al. CALLHOME Mandarin Chinese Lexicon LDC96L15. Web Download. Philadelphia: Linguistic Data Consortium, 1996. |
Related Works: | View |
The CALLHOME Mandarin Chinese collection includes a lexical component. The CALLHOME Mandarin Lexicon consists of 44,405 words and contains separate information fields with phonological, morphological and frequency information for each word.
The token coverage by the LDC Mandarin lexicon of words occurring in the 20 LDC Mandarin CALLHOME devtest transcripts (ten minutes of conversation each) is 98%.
Orthographic Chinese characters are GB-encoded and are simplified in the Mainland style. A representation of the headword in tone pinyin with strictly lexical tone, i.e. not reflecting phonetic/phonological processes is also provided.
Here is a sample page from the lexicon. The transcripts and documentation (LDC96T16) are available separately, as is a corpus of telephone speech (LDC96S34).