Alphadigit Corpus Release 1.3 Center for Spoken Language Understanding UPDATED: 23 August 2002 This document describes the file naming conventions used for this distribution and gives a brief description of the various file formats used. File Naming Convention ---------------------- Each speechfile filename in the Alphadigit Corpus encodes information about the call number, utterance type, and file type. Here is a typical filename: AD-2.p17.wav AD The "AD" prefix indicates the corpus name, i.e. Alphadigit. 2 The number between the hypen and the first dot is the call number. Call numbers are described below. p17 The string between the first and second dot is the utterance type. The utterance types are of the form "p##". The numbers after the p are the utterance number and reflect the order that each utterance was recorded during the call. That is, p1 came first, p2, second, etc. Some of the callers were asked to say 29 utterances and some were asked to say 19 utterances. wav The final three letter extention indicates the file type. The "wav" files contain speech data and use the RIFF standard wav file format. This file format is 16-bit linearly encoded. Each of the files has a corresponding non-time-aligned word-level transcription, located in the /trans directory and time aligned phoneme transcription (automatic forced alignment), located in the /labels directory. These transcriptions are contained in plain text files.