CALLHOME Mandarin Chinese Lexicon

Item Name: CALLHOME Mandarin Chinese Lexicon
Authors: Shudong Huang, Xuejun Bian, Grace Wu and Cynthia McLemore
LDC Catalog No.: LDC96L15
ISBN: 1-58563-079-9
Data Type: lexicon
Data Source(s): telephone conversations
Project(s): EARS, GALE, Hub5-LVCSR
Application(s): speech recognition
Language(s): Mandarin Chinese
Distribution: Web Download
Member fee: $0 for 1996, 1997 members
Non-member Fee: US $2250.00
Reduced-License Fee: US $1125.00
Extra-Copy Fee: N/A
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Shudong Huang, Xuejun Bian, Grace Wu and Cynthia McLemore
CALLHOME Mandarin Chinese Lexicon
Linguistic Data Consortium, Philadelphia

The CALLHOME Mandarin Chinese collection includes a lexical component. The CALLHOME Mandarin Lexicon consists of 44,405 words and contains separate information fields with phonological, morphological and frequency information for each word.

The token coverage by the LDC Mandarin lexicon of words occurring in the 20 LDC Mandarin CALLHOME devtest transcripts (ten minutes of conversation each) is 98%.

Orthographic Chinese characters are GB-encoded and are simplified in the Mainland style. A representation of the headword in tone pinyin with strictly lexical tone, i.e. not reflecting phonetic/phonological processes is also provided.

Here is a sample page from the lexicon. The transcripts and documentation (LDC96T16) are available separately, as is a corpus of telephone speech (LDC96S34).

Content Copyright