This file contains documentation for CSLU: Alphadigit Version 1.3 , Linguistic Data Consortium (LDC) catalog number LDC2008S06 and isbn 1-58563-478-6.
Alphadigit Version 1.3 is a collection of 78,044 utterances from 3,025 speakers saying six-digit strings of letters and digits over the telephone for a total of approximately 82 hours of speech. Each speech file has corresponding orthographic and phonemic transcriptions. This corpus was created by the Center for Spoken Language Understanding (CSLU), Oregon Health & Science University, Beaverton, Oregon.
Speakers were recruited using USEnet postings. Respondents registered for the collection by completing an online form. Once registered, they received a list of 18-29 six-digit strings (e.g., "a 2 b 4 5 g") and participation instructions. Speakers called the CSLU data collection system by dialing a toll-free number and were prompted for each string; 1102 different strings were used throughout the course of the data collection. The lists were set up to balance for phonetic context between all letter and digit pairs.
The data were recorded directly from a digital phone line without digital-to-analog or analog-to-digital conversion at the recording end using the CSLU T1 digital data collection system. The sampling rate was 8khz and the files were stored in 8-bit mu-law format on a UNIX file system. The files have been converted to RIFF standard file format, 16-bit linearly encoded.
All of the files included in this corpus have corresponding non-time-aligned word-level transcriptions and time aligned phoneme-level transcriptions (automatic forced alignment) that comply with the conventions in the CSLU Labeling Guide. Non time-aligned orthographic transcriptions provide quick access to the content of an utterance; they may contain markers for word boundaries to support access and retrieval at the lexical level. Phonetic/phonemic transcriptions represent the phonetic content of an utterance at a given level of detail that is made explicit by the use of diacritics. Phonetic phenomena transcribed include excessive nasalization, glottalization, frication on a stop, centralization, lateralization, rounding and palatalization.
For an example of the speech contained in this corpus, please listen to this audio sample (MS wave).
Portions © 2000-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2008 Trustees of the University of Pennsylvania