The CALLHOME German corpus collection includes a lexical component. The CALLHOME German lexicon consists of 318,807 words. Of these, 315,503 words are adapted from the CELEX German lexicon produced by The Centre for Lexical Information, Max Planck Institute for Psycholinguistics in Nijmigen and 3,304 additional words come from the 80 training and 20 development test (devtest) transcripts (ten minutes each) from the LDC German CALLHOME telephone speech corpus.
The German lexicon contains tab-separated information fields with orthographic, morphological, phonological, stress, source and frequency information for each word.
Here is a sample page from the lexicon. The transcripts and documentation (LDC97T15) are available separately, as is a corpus of telephone speech (LDC97S43).
Updates There are no updates at this time.