File: wav_spec.doc, updated 07/22/94

(Note: the waveform files in this distribution have been compressed using embedded Shorten compression. The header information below does not reflect the compression. Please look at the headers of one of the actual waveform files on Disc 17-2.1 or 17-3.1 for a sample header.)

MADCOW ATIS3 Speech Waveform (.wav) File Type Specifications

MADCOW ATIS speech waveform files are to be formatted using the NIST SPHERE header structure.

The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. The fixed portion is as follows:

NIST_1A<new-line>
   1024<new-line>

The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identify the header as well as allow those who do not wish to read the subsequent header information to programmatically skip over it.

The remaining object-oriented variable portion is composed of object-type-value "triple" lines which have the following format:

<LINE> ::= <TRIPLE><new-line> |
           <COMMENT><new-line> | 
           <TRIPLE><COMMENT><new-line> | 

  <TRIPLE> ::= <OBJECT><space><TYPE><space><VALUE><OPT-SPACES>

    <OBJECT> ::= <PRIMARY-SUBOBJECT> | 
                 <PRIMARY-SUBOBJECT><SECONDARY-SUBOBJECT>

    <PRIMARY-SUBOBJECT> ::= <ALPHA> | <ALPHA><ALPHA-NUM-STRING>
    <SECONDARY-SUBOBJECT> ::= _<ALPHA-NUM-STRING> | 
                              _<ALPHA-NUM-STRING><SECONDARY-SUBOBJECT>

    <TYPE> ::= -<INTEGER-FLAG> | -<REAL-FLAG> | -<STRING-FLAG>

      <INTEGER-FLAG> ::= i
      <REAL-FLAG> ::= r
      <STRING-FLAG> ::= s<DIGIT-STRING>
      
    <VALUE> ::= <INTEGER> | <REAL> | <STRING>  (depending on object type)

      <INTEGER> ::= <SIGN><DIGIT-STRING>
      <REAL> ::= <SIGN><DIGIT-STRING>.<DIGIT-STRING> 

    <OPT-SPACES> ::= <SPACES> | NULL

  <COMMENT> ::= ;<STRING>  (excluding embedded new-lines)

<ALPHA-NUM-STRING> ::= <ALPHA-NUM> | <ALPHA-NUM><ALPHA-NUM-STRING>
<ALPHA-NUM> ::= <DIGIT> | <ALPHA>
<ALPHA> ::= a | ... | z | A | ... | Z
<DIGIT-STRING> ::= <DIGIT> | <DIGIT><DIGIT-STRING>
<DIGIT> ::= 0 | ... | 9
<SIGN> ::= + | - | NULL
<SPACES> ::= <space> | <SPACES><space>
<STRING> ::=  <CHARACTER> | <CHARACTER><STRING>
<CHARACTER> ::= char(0) | char(1) | ... | char(255)

Note: The grammar does not impose any limit on the number or order of objects.

The single object "end_head" marks the end of the active header and the remaining unused header space is undefined.

The MADCOW headers must include the following fields:

Field                    Type     Description - Probable defaults marked in ()
-----------------------  -------  ---------------------------------------------
speaker_id               string   3-char. speaker ID from filename
speaking_mode            string   speaking mode ("spontaneous" or "read")
recording_date           string   beginning of recording date stamp of the
                                  form DD-MMM-YYYY.  Should contain the string
                                  "unknown" if this info is not available.
recording_time -s11      string   beginning of recording time stamp of the
                                  form HH:MM:SS.HH.  Should contain the string
                                  "unknown" if this info is not available.
microphone               string   microphone description ("Sennheiser HMD-410"
                                  or "Crown PCC-160")
utterance_id             string   utterance ID from filename of the form
                                  XXXUUSMP as described in the filenames 
                                  specification document.
database_id              string   database (corpus) identifier ("atis3")
database_version         string   database (corpus) revision ("1.0")
channel_count            integer  number of channels in waveform ("1")
speaker_session_number   string   1-char. scenario-session ID from filename
sample_count             integer  number of samples in waveform
sample_max               integer  maximum sample value in waveform
sample_min               integer  minimum sample value in waveform
sample_rate              integer  waveform sampling rate ("16000")
sample_n_bytes           integer  number of bytes per sample ("2")
sample_byte_format       string   byte order (MSB/LSB -> "10" or 
                                  LSB/MSB -> "01")
sample_sig_bits          integer  number of significant bits in each sample
                                  ("16")
session_utterance_number integer  base-10 version of "speaker_sentence_number"
speaker_sentence_number  string   2-character query within scenario-session
                                  (base 36) from filename
end_head                 none     end of header identifier

Note: Although these fields are mandatory, you may include additional local fields if you wish. If you do add fields, please email us the field names and types so that we can add them to our library.

Hypothetical ATIS3 header:

NIST_1A
   1024
speaker_id -s3 z01                     (speaker ID "z01")
speaking_mode -s11 spontaneous         (spontaneous speaking mode)
recording_date -s11 17-APR-1993        (date of beginning of recording)
recording_time -s11 10:23:39.79        (time of beginning of recording)
microphone -s18 Sennheiser HMD-410     (microphone description)
utterance_id -s8 z010b1ss              (utterance ID from filename)
database_id -s5 atis3                  (corpus ID "atis3")
database_version -s3 1.0               (utterance release "1.0")
channel_count -i 1                     (single-channel waveform)
speaker_session_number -s1 1           (scenario-session 1 for speaker)
sample_count -i 215040                 (number of samples in waveform)
sample_max -i 1115                     (maximum sample value)
sample_min -i -1479                    (minimum sample value)
sample_rate -i 16000                   (16 kHz. sampling rate)
sample_n_bytes -i 2                    (2-bytes per sample)
sample_byte_format -s2 10              (MSB/LSB byte order)
sample_sig_bits -i 16                  (16 significant bits per sample)
session_utterance_number -i 11         (scenario-session query 11, base 10)
speaker_sentence_number -s3 0b         (scenario-session query 0b, base 36)
end_head

The speech waveform data follows the header and should contain single-channel 16-bit linear quantized samples at 16kHz. sample rate. There should be no "gain normalization" or post-processing of the the waveform data.