Home › Language Resources › Data

CSLU: S4X Release 1.2

Item Name:	CSLU: S4X Release 1.2
Author(s):	Ronald Allan Cole, M Noel, T. Lander, T Durham
LDC Catalog No.:	LDC2009S03
ISBN:	1-58563-523-5
ISLRN:	644-574-573-711-4
DOI:	https://doi.org/10.35111/6a5x-dv17
Release Date:	September 15, 2009
Member Year(s):	2009
DCMI Type(s):	Sound
Sample Type:	8 bit ulaw
Sample Rate:	8000
Data Source(s):	telephone speech
Application(s):	speech recognition
Language(s):	English
Language ID(s):	eng
License(s):	CSLU Agreement
Online Documentation:	LDC2009S03 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Cole, Ronald Allan, et al. CSLU: S4X Release 1.2 LDC2009S03. Web Download. Philadelphia: Linguistic Data Consortium, 2009.

Introduction

CSLU: S4X Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2009S03 and isbn 1-58563-523-5, was created by the Center for Spoken Language Understanding, Oregon Health and Science University (CSLU). The corpus consists of 36 speakers (22 male, 14 female) uttering 11 specified words.

The speakers repeated the following words six times on each of four channels: startrek, supernova, tektronix, generation, nebula, processing, singularity, 71523, abracadabra, sungeeta and computer. The four channels used were office phone, home phone, carbon microphone telephone and speaker phone. Each speech file has a corresponding time-aligned phoneme-level transcription (achieved using automatic forced alignment) and an automatically-generated world-level transcription.

Humans reviewed each utterance in two passes and classified it as good, bad, noisy or different. The results of this verification process are included in the /docs directory.

Data

The data was recorded with the CSLU T1 digital data collection system. Each utterance is recorded as a separate file. These files were sampled at 8 khz 8-bit and stored as ulaw files. All of the data use the RIFF standard file format. This file format is 16-bit linearly encoded.

Samples

For an example of the data in this corpus, please listen to this recording of a subject speaking the word 'computer': SD-1030-computer-t3-67.

CSLU: S4X Release 1.2

Introduction

Data

Samples

Copyright

Available Media

View Fees