Korean Telephone Conversations Lexicon


Introduction

Korean Telephone Conversations Lexicon was produced by Linguistic Data Consortium (LDC) catalog number LDC2003L02 and ISBN 1-58563-265-1.

Korean Telephone Conversations Lexicon consists of 25,251 words, and contains separate fields with phonological, morphological, and frequency information for each word.

The lexicon covers the tokens occurring in 100 telephone conversations transcribed and published as Korean Telephone Conversations Transcripts. The token coverage is 100%. The corresponding speech is published as Korean Telephone Conversations Speech.

Data

The lexicon contains five tab-separated information fields:

  1. orthographic form in Hangul (head-word), encoded in the KSC-5601 (Wansung) system
  2. orthographic form in Yale romanization
  3. pronunciation
  4. frequency of the word in Korean Telephone Conversations Transcripts
  5. morphological analysis of the word

Please go to the doc directory for the documentation files.
file.tbl a complete listing of the files
ko_lex.txt description of the Korean Telephone Conversations Lexicon

Please go to the bin directory for tools for conversion between Roman and Korean orthography.

Updates

Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2003L02.

Content Copyright

Portions © 2003 Trustees of the University of Pennsylvania.


Contact: ldc@ldc.upenn.edu
© 2003 Linguistic Data Consortium, Trustees of the University of Pennsylvania. All Rights Reserved.