CSLU: Foreign Accented English Release 1.2


Item Name: CSLU: Foreign Accented English Release 1.2
Authors: T. Lander
LDC Catalog No.: LDC2007S08
ISBN: 1-58563-392-5
Release Date: May 17, 2007
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: ulaw
Data Source(s): telephone speech
Application(s): speech recognition
Language(s): English
Language ID(s): eng
Distribution: 1 DVD
Member fee: $0 for 2007 members
Non-member Fee: US $150.00
Reduced-License Fee: US $150.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: T. Lander
2007
CSLU: Foreign Accented English Release 1.2
Linguistic Data Consortium, Philadelphia

Introduction

This file contains documentation on CSLU: Foreign Accented English Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2006S38 and isbn 1-58563-392-5.

CSLU: Foreign Accented English Release 1.2 consists of continuous speech in English by native speakers of 22 different languages: Arabic, Cantonese, Czech, Farsi, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Mandarin Chinese, Malay, Polish, Portuguese (Brazilian and Iberian), Russian, Swedish, Spanish, Swahili, Tamil and Vietnamese. The corpus contains 4925 telephone-quality utterances, information about the speakers' linguistic backgrounds and perceptual judgments about the accents in the utterances. The speakers were asked to speak about themselves in English for 20 seconds. Three native speakers of American English independently listened to each utterance and judged the speakers' accents on a 4-point scale: negligible/no accent, mild accent, strong accent and very strong accent. This corpus is intended to support the study of the underlying characteristics of foreign accent and to enable research, development and evaluation of algorithms for the identification and understanding of accented speech. Some of the files in this corpus are also contained in CSLU: 22 Languages Corpus, LDC2005S26.

Samples

For an example of the data in this corpus, please listen to this audio sample.

Content Copyright

Portions 2000-2002 Center for Spoken Language Understanding, Oregon Health & Science University, 2007 Trustees of the University of Pennsylvania