Korean Telephone Conversations Transcripts

Item Name: Korean Telephone Conversations Transcripts
Author(s): Eon-Suk Ko, Na-Rae Han, Stephanie Strassel, Nii Martey
LDC Catalog No.: LDC2003T08
ISBN: 1-58563-264-3
ISLRN: 248-953-409-804-2
DOI: https://doi.org/10.35111/92vj-wg93
Release Date: May 16, 2003
Member Year(s): 2003
DCMI Type(s): Text
Data Source(s): telephone conversations
Application(s): speech recognition
Language(s): Korean
Language ID(s): kor
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2003T08 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Ko, Eon-Suk, et al. Korean Telephone Conversations Transcripts LDC2003T08. Web Download. Philadelphia: Linguistic Data Consortium, 2003.
Related Works: View

Introduction

Korean Telephone Conversations Transcripts was produced by the Linguistic Data Consortium (LDC) and contains transcripts of 100 telephone calls in Korean, totaling approximately 190 K-words (thousands of words).

The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEND Korean telephone speech was collected by Linguistic Data Consortium primarily in support of the Language Identification (LID) project, sponsored by the U.S. Department of Defense. The calls were later transcribed for use in other projects.

This publication consists of 100 transcribed telephone conversations in Korean. The corresponding speech files for these transcripts are available in Korean Telephone Conversations Speech (LDC2003S03). The Korean orthographic forms from the 100 transcription files serve as the head-words in the associated Korean Telephone Conversations Lexicon.

The recorded conversations are between native speakers of Korean and last up to 30 minutes, of which the transcribed speech covers between 15 to 18 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in either the United States or Canada.

Data

There are 100 time aligned text files, totaling approximately 190 K-words and 25K unique words.

The transcription followed the orthographic form of spoken words instead of the actual pronunciation in the cases of mismatching. When the mismatch between the written form and the actual pronunciation is beyond what can be predicted by the pronunciation dictionary, it was marked with a '+' symbol.

All files are in Korean orthography: orthographic Korean characters are in Hangul, encoded in KSC5601 (Wansung) system, also known as EUC-KR or ISO-2022-KR.

Samples

Please follow this link for a sample transcript: txt | gif.

Updates

There are no updates available at this time.

Available Media

View Fees





Login for the applicable fee