CSLU: Multilanguage Telephone Speech Version 1.2


Item Name: CSLU: Multilanguage Telephone Speech Version 1.2
Authors: Yeshwant Muthusamy, Ron Cole, and Beatrice Oshika
LDC Catalog No.: LDC2006S35
ISBN: 1-58563-390-9
Release Date: Jun 15, 2006
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: pcm
Data Source(s): telephone speech
Application(s): language identification, machine translation
Language(s): English, French, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil, Vietnamese, Western Farsi
Language ID(s): cmn, deu, eng, fra, hin, jpn, kor, pes, spa, tam, vie
Distribution: 1 DVD
Member fee: $0 for 2006 members
Non-member Fee: US $150.00
Reduced-License Fee: US $150.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Yeshwant Muthusamy, Ron Cole, and Beatrice Oshika
2006
CSLU: Multilanguage Telephone Speech Version 1.2
Linguistic Data Consortium, Philadelphia

Introduction

The Multilanguage Telephone Speech corpus consists of telephone speech from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese. The corpus contains fixed vocabulary utterances (eg. days of the week) as well as fluent continuous speech. The current release includes recorded utterances from about 2,052 speakers, for a total of about 38.5 hours of speech. Time-aligned phonetic transcriptions for 619 of the utterances are also included.

Data

Each subject called the CSLU data collection system by dialing a toll-free number. An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8 khz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file.

Samples

For an example of the data in this corpus, please listen to these audio samples in Tamil and English.

Content Copyright

Portions 1992, 2000, 2002 Center for Spoken Language Understanding, Oregon Health & Science University, 2006 Trustees of the University of Pennsylvania