CSLU: Alphadigit Version 1.3


Item Name: CSLU: Alphadigit Version 1.3
Authors: R. A. Cole, M. Noel, T. Lander, and T. Durham
LDC Catalog No.: LDC2008S06
ISBN: 1-58563-478-6
Release Date: Jul 16, 2008
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: ulaw
Data Source(s): telephone conversations
Application(s): speech recognition
Language(s): English
Language ID(s): eng
Distribution: 2 DVD
Member fee: $0 for 2008 members
Non-member Fee: US $150.00
Reduced-License Fee: US $150.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: R. A. Cole, et al.
2008
CSLU: Alphadigit Version 1.3
Linguistic Data Consortium, Philadelphia

Introduction

This file contains documentation for CSLU: Alphadigit Version 1.3 , Linguistic Data Consortium (LDC) catalog number LDC2008S06 and isbn 1-58563-478-6.

Alphadigit Version 1.3 is a collection of 78,044 utterances from 3,025 speakers saying six-digit strings of letters and digits over the telephone for a total of approximately 82 hours of speech. Each speech file has corresponding orthographic and phonemic transcriptions. This corpus was created by the Center for Spoken Language Understanding (CSLU), Oregon Health & Science University, Beaverton, Oregon.

Data

Speakers were recruited using USEnet postings. Respondents registered for the collection by completing an online form. Once registered, they received a list of 18-29 six-digit strings (e.g., "a 2 b 4 5 g") and participation instructions. Speakers called the CSLU data collection system by dialing a toll-free number and were prompted for each string; 1102 different strings were used throughout the course of the data collection. The lists were set up to balance for phonetic context between all letter and digit pairs.

The data were recorded directly from a digital phone line without digital-to-analog or analog-to-digital conversion at the recording end using the CSLU T1 digital data collection system. The sampling rate was 8khz and the files were stored in 8-bit mu-law format on a UNIX file system. The files have been converted to RIFF standard file format, 16-bit linearly encoded.

Transcription

All of the files included in this corpus have corresponding non-time-aligned word-level transcriptions and time aligned phoneme-level transcriptions (automatic forced alignment) that comply with the conventions in the CSLU Labeling Guide. Non time-aligned orthographic transcriptions provide quick access to the content of an utterance; they may contain markers for word boundaries to support access and retrieval at the lexical level. Phonetic/phonemic transcriptions represent the phonetic content of an utterance at a given level of detail that is made explicit by the use of diacritics. Phonetic phenomena transcribed include excessive nasalization, glottalization, frication on a stop, centralization, lateralization, rounding and palatalization.

Samples

For an example of the speech contained in this corpus, please listen to this audio sample (MS wave).

Content Copyright

Portions 2000-2002 Center for Spoken Language Understanding, Oregon Health & Science University, 2008 Trustees of the University of Pennsylvania