CALLHOME Mandarin Chinese Lexicon

Item Name: CALLHOME Mandarin Chinese Lexicon
Author(s): Shudong Huang, Xuejun Bian, Grace Wu, Cynthia McLemore
LDC Catalog No.: LDC96L15
ISBN: 1-58563-079-9
ISLRN: 969-490-893-990-1
Member Year(s): 1996, 1997
DCMI Type(s): Text
Data Source(s): telephone conversations
Project(s): EARS, Hub5-LVCSR, GALE
Application(s): speech recognition
Language(s): Mandarin Chinese
Language ID(s): cmn
License(s): CALLHOME Lexicon Agreement (Commercial)
CALLHOME Lexicon Agreement (Non-Commercial)
CALLHOME Lexicon Agreement (Non-Member)
Online Documentation: LDC96L15 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Huang, Shudong, et al. CALLHOME Mandarin Chinese Lexicon LDC96L15. Web Download. Philadelphia: Linguistic Data Consortium, 1996.
Related Works: View

The CALLHOME Mandarin Chinese collection includes a lexical component. The CALLHOME Mandarin Lexicon consists of 44,405 words and contains separate information fields with phonological, morphological and frequency information for each word.

The token coverage by the LDC Mandarin lexicon of words occurring in the 20 LDC Mandarin CALLHOME devtest transcripts (ten minutes of conversation each) is 98%.

Orthographic Chinese characters are GB-encoded and are simplified in the Mainland style. A representation of the headword in tone pinyin with strictly lexical tone, i.e. not reflecting phonetic/phonological processes is also provided.

Here is a sample page from the lexicon. The transcripts and documentation (LDC96T16) are available separately, as is a corpus of telephone speech (LDC96S34).

Available Media

View Fees

Login for the applicable fee