CSLU: Multilanguage Telephone Speech Version 1.2

Item Name: CSLU: Multilanguage Telephone Speech Version 1.2
Author(s): Yeshwant Muthusamy, Ronald Cole, Beatrice Oshika
LDC Catalog No.: LDC2006S35
ISBN: 1-58563-390-9
ISLRN: 871-936-811-171-7
Release Date: June 15, 2006
Member Year(s): 2006
DCMI Type(s): Sound
Sample Type: pcm
Sample Rate: 8000
Data Source(s): telephone speech
Application(s): machine translation, language identification
Language(s): Vietnamese, Tamil, Spanish, Iranian Persian, Korean, Japanese, Hindi, French, English, German, Mandarin Chinese
Language ID(s): vie, tam, spa, pes, kor, jpn, hin, fra, eng, deu, cmn
License(s): CSLU Agreement
Online Documentation: LDC2006S35 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Muthusamy, Yeshwant, Ronald Cole, and Beatrice Oshika. CSLU: Multilanguage Telephone Speech Version 1.2 LDC2006S35. Web Download. Philadelphia: Linguistic Data Consortium, 2006.

Introduction

The Multilanguage Telephone Speech corpus consists of telephone speech from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese. The corpus contains fixed vocabulary utterances (eg. days of the week) as well as fluent continuous speech. The current release includes recorded utterances from about 2,052 speakers, for a total of about 38.5 hours of speech. Time-aligned phonetic transcriptions for 619 of the utterances are also included.

Data

Each subject called the CSLU data collection system by dialing a toll-free number. An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8 khz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file.

Samples

For an example of the data in this corpus, please listen to these audio samples in Tamil and English.

Available Media

View Fees

Member
Non-Member
Reduced-License
Extra Copy
Login for the applicable fee