Alphadigit Corpus
                            Release 1.3

              Center for Spoken Language Understanding


UPDATED: 23 August 2002


Overview
--------
This release includes recorded utterances from 3025
different callers and a transcription of each utterance.
There are a total of 78044 speech files. All of the files
included in this corpus have corresponding non-time-aligned
word-level transcriptions, time aligned phoneme-
level transcriptions (automatic forced alignment), that
comply with the conventions in the CSLU Labeling Guide.

Recording Conditions
-------------------- 
Each subject called the CSLU data collection system by
dialing a toll-free number. The data were recorded directly
off of a digital phone line without digital-to-analog or
analog-to-digital conversion at the recording end.

The digital data were collected with the CSLU T1 digital
data collection system described in "Digital Data
Collection at CSLU" (please see our web site).  The
sampling rate was 8khz and the files were stored in 8-bit
mu-law format on a UNIX file system. These files have been
converted to the RIFF standard file format. This file
format is 16-bit linearly encoded.

Subject Population
------------------
Subjects whose utterances are included in this corpus are
respondents to USEnet postings. Respondants were required
to fill out a form on the World Wide Web and register for
the data collection. In response to their registration a
list of letters and digits was emailed to them along with
instructions on how to participate.

File Naming Conventions 
-----------------------
Each utterance is stored in an individual file, whose name
indicates the language and session number of the caller.

For example:

     AD-1.p22.wav

The first field ("AD") is the prefix indicating the corpus
to which this data belongs, the second field ("1")
represents a unique ID number for the speaker, and the
third field ("p22") indicates the prompt to which the
speaker was responding.

Protocol
------------------
Each participant was given a list of six digit strings to
read over the phone. The participants called the system and
were prompted for each string. The strings contained both
digits and letters ("a 2 b 4 5 g", for example). 1102
different strings were used throughout the course of the
data collection. See docs/lists.txt for the complete lists.

The lists were set up to balance for phonetic context
between all letter and digit pairs. Many of the letters and
digits share "phonetic ccontext" on the left or right side.
For example, "p" and "3" both end in an "ee" sound so they
share the right context. There were fourteen groups sharing 
right context and nineteen groups sharing left context. 


                                    Shared right context
                                    0   a,j,k
                                    1   b,c,d,e,g,p,t,v,z,3
                                    2   f
                                    3   h
                                    4   i,y
                                    5   l
                                    6   m
                                    7   n,1,7,9
                                    8   0,o
                                    9   q,u,w,2
                                   10   r,4
                                   11   x,6,s
                                   12   5
                                   13   8


                                    Shared left context
                                    0   a,h,8
                                    1   b
                                    2   c,6,7
                                    3   d,w
                                    4   e
                                    5   n,f,l,m,x,s
                                    6   g,j
                                    7   i,r
                                    8   k,q
                                    9   o
                                   10   p
                                   11   2,t
                                   12   u
                                   13   v
                                   14   y,1
                                   15   0,z
                                   16   3
                                   17   4,5
                                   18   9


After having established the context groups, a list of
strings was chosen that provided even coverage of all the
phone-context pairs and that provided a reasonably balanced
number of each token. 

This long list of strings was split into several smaller
lists of 18-29 strings. These small lists were sent to
participants as they registered.