CSLU: Multilanguage Telephone Speech Version 1.2

Item Name: CSLU: Multilanguage Telephone Speech Version 1.2
Author(s): Yeshwant Muthusamy, Ronald Cole, Beatrice Oshika
LDC Catalog No.: LDC2006S35
ISBN: 1-58563-390-9
ISLRN: 871-936-811-171-7
Release Date: June 15, 2006
Member Year(s): 2006
DCMI Type(s): Sound
Sample Type: pcm
Sample Rate: 8000
Data Source(s): telephone speech
Application(s): machine translation, language identification
Language(s): Vietnamese, Tamil, Spanish, Iranian Persian, Korean, Japanese, Hindi, French, English, German, Mandarin Chinese
Language ID(s): vie, tam, spa, pes, kor, jpn, hin, fra, eng, deu, cmn
License(s): CSLU Agreement
Online Documentation: LDC2006S35 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Muthusamy, Yeshwant, Ronald Cole, and Beatrice Oshika. CSLU: Multilanguage Telephone Speech Version 1.2 LDC2006S35. Web Download. Philadelphia: Linguistic Data Consortium, 2006.

Introduction

The Multilanguage Telephone Speech corpus consists of telephone speech from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese. The corpus contains fixed vocabulary utterances (eg. days of the week) as well as fluent continuous speech. The current release includes recorded utterances from about 2,052 speakers, for a total of about 38.5 hours of speech. Time-aligned phonetic transcriptions for 619 of the utterances are also included.

Data

Each subject called the CSLU data collection system by dialing a toll-free number. An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8 khz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file.

Samples

For an example of the data in this corpus, please listen to these audio samples in Tamil and English.

Available Media

View Fees





Login for the applicable fee