Alphadigit Corpus
                            Release 1.3

              Center for Spoken Language Understanding


UPDATED: 23 August 2002


This document describes the file naming conventions used
for this distribution and gives a brief description of the
various file formats used.

File Naming Convention
----------------------
Each speechfile filename in the Alphadigit Corpus encodes
information about the call number, utterance type, and file
type. Here is a typical filename:

AD-2.p17.wav


    AD      The "AD" prefix indicates the corpus name, i.e.
            Alphadigit.  

    2       The number between the hypen and the first dot
            is the call number.  Call numbers are described 
            below.  

    p17     The string between the first and second dot is
            the utterance type.  The utterance types are of
            the form "p##".

            The numbers after the p are the utterance
            number and reflect the order that each
            utterance was recorded during the call. That
            is, p1 came first, p2, second, etc.  Some of
            the callers were asked to say 29 utterances and
            some were asked to say 19 utterances.

    wav     The final three letter extention indicates the
            file type.

The "wav" files contain speech data and use the RIFF
standard wav file format. This file format is 16-bit linearly 
encoded.

Each of the files has a corresponding non-time-aligned
word-level transcription, located in the /trans directory
and time aligned phoneme transcription (automatic forced 
alignment), located in the /labels directory. 
These transcriptions are contained in plain text files.