(Note: the waveform files in this distribution have been compressed using embedded Shorten compression. The header information below does not reflect the compression. Please look at the headers of one of the actual waveform files on Disc 17-2.1 or 17-3.1 for a sample header.)
MADCOW ATIS speech waveform files are to be formatted using the NIST SPHERE header structure.
The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. The fixed portion is as follows:
NIST_1A<new-line> 1024<new-line>The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identify the header as well as allow those who do not wish to read the subsequent header information to programmatically skip over it.
The remaining object-oriented variable portion is composed of object-type-value "triple" lines which have the following format:
<LINE> ::= <TRIPLE><new-line> | <COMMENT><new-line> | <TRIPLE><COMMENT><new-line> | <TRIPLE> ::= <OBJECT><space><TYPE><space><VALUE><OPT-SPACES> <OBJECT> ::= <PRIMARY-SUBOBJECT> | <PRIMARY-SUBOBJECT><SECONDARY-SUBOBJECT> <PRIMARY-SUBOBJECT> ::= <ALPHA> | <ALPHA><ALPHA-NUM-STRING> <SECONDARY-SUBOBJECT> ::= _<ALPHA-NUM-STRING> | _<ALPHA-NUM-STRING><SECONDARY-SUBOBJECT> <TYPE> ::= -<INTEGER-FLAG> | -<REAL-FLAG> | -<STRING-FLAG> <INTEGER-FLAG> ::= i <REAL-FLAG> ::= r <STRING-FLAG> ::= s<DIGIT-STRING> <VALUE> ::= <INTEGER> | <REAL> | <STRING> (depending on object type) <INTEGER> ::= <SIGN><DIGIT-STRING> <REAL> ::= <SIGN><DIGIT-STRING>.<DIGIT-STRING> <OPT-SPACES> ::= <SPACES> | NULL <COMMENT> ::= ;<STRING> (excluding embedded new-lines) <ALPHA-NUM-STRING> ::= <ALPHA-NUM> | <ALPHA-NUM><ALPHA-NUM-STRING> <ALPHA-NUM> ::= <DIGIT> | <ALPHA> <ALPHA> ::= a | ... | z | A | ... | Z <DIGIT-STRING> ::= <DIGIT> | <DIGIT><DIGIT-STRING> <DIGIT> ::= 0 | ... | 9 <SIGN> ::= + | - | NULL <SPACES> ::= <space> | <SPACES><space> <STRING> ::= <CHARACTER> | <CHARACTER><STRING> <CHARACTER> ::= char(0) | char(1) | ... | char(255)Note: The grammar does not impose any limit on the number or order of objects.
The single object "end_head" marks the end of the active header and the remaining unused header space is undefined.
The MADCOW headers must include the following fields:
Field Type Description - Probable defaults marked in () ----------------------- ------- --------------------------------------------- speaker_id string 3-char. speaker ID from filename speaking_mode string speaking mode ("spontaneous" or "read") recording_date string beginning of recording date stamp of the form DD-MMM-YYYY. Should contain the string "unknown" if this info is not available. recording_time -s11 string beginning of recording time stamp of the form HH:MM:SS.HH. Should contain the string "unknown" if this info is not available. microphone string microphone description ("Sennheiser HMD-410" or "Crown PCC-160") utterance_id string utterance ID from filename of the form XXXUUSMP as described in the filenames specification document. database_id string database (corpus) identifier ("atis3") database_version string database (corpus) revision ("1.0") channel_count integer number of channels in waveform ("1") speaker_session_number string 1-char. scenario-session ID from filename sample_count integer number of samples in waveform sample_max integer maximum sample value in waveform sample_min integer minimum sample value in waveform sample_rate integer waveform sampling rate ("16000") sample_n_bytes integer number of bytes per sample ("2") sample_byte_format string byte order (MSB/LSB -> "10" or LSB/MSB -> "01") sample_sig_bits integer number of significant bits in each sample ("16") session_utterance_number integer base-10 version of "speaker_sentence_number" speaker_sentence_number string 2-character query within scenario-session (base 36) from filename end_head none end of header identifierNote: Although these fields are mandatory, you may include additional local fields if you wish. If you do add fields, please email us the field names and types so that we can add them to our library.
Hypothetical ATIS3 header:
NIST_1A 1024 speaker_id -s3 z01 (speaker ID "z01") speaking_mode -s11 spontaneous (spontaneous speaking mode) recording_date -s11 17-APR-1993 (date of beginning of recording) recording_time -s11 10:23:39.79 (time of beginning of recording) microphone -s18 Sennheiser HMD-410 (microphone description) utterance_id -s8 z010b1ss (utterance ID from filename) database_id -s5 atis3 (corpus ID "atis3") database_version -s3 1.0 (utterance release "1.0") channel_count -i 1 (single-channel waveform) speaker_session_number -s1 1 (scenario-session 1 for speaker) sample_count -i 215040 (number of samples in waveform) sample_max -i 1115 (maximum sample value) sample_min -i -1479 (minimum sample value) sample_rate -i 16000 (16 kHz. sampling rate) sample_n_bytes -i 2 (2-bytes per sample) sample_byte_format -s2 10 (MSB/LSB byte order) sample_sig_bits -i 16 (16 significant bits per sample) session_utterance_number -i 11 (scenario-session query 11, base 10) speaker_sentence_number -s3 0b (scenario-session query 0b, base 36) end_headThe speech waveform data follows the header and should contain single-channel 16-bit linear quantized samples at 16kHz. sample rate. There should be no "gain normalization" or post-processing of the the waveform data.