File: wav-specs.doc, updated 11/03/92 MADCOW Speech Waveform (.wav) File Type Specifications ATIS MADCOW speech waveform files have been formatted using the NIST SPHERE header structure. They are stored on cd-rom in compressed form, using a version of Toni Robinson's "shorten" algorithm for waveform data compression. Source code (in "C") for the SPHERE Library and Utilities is available via anonymous ftp from NIST (see below for instructions on downloading the software). Users without access to Internet ftp file transfers may contact the Linguistic Data Consortium to obtain the source code by mail (see instructions at the end of this file). The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. The fixed portion is as follows: NIST_1A 1024 The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identify the header as well as allow those who do not wish to read the subsequent header information to programmatically skip over it. The remaining object-oriented variable portion is composed of object-type-value "triple" lines which have the following format: ::= | | ::= ::= | ::= | ::= _ | _ ::= - | - | - ::= i ::= r ::= s ::= | | (depending on object type) ::= ::= . ::= | NULL ::= ; (excluding embedded new-lines) ::= | ::= | ::= a | ... | z | A | ... | Z ::= | ::= 0 | ... | 9 ::= + | - | NULL ::= | ::= | ::= char(0) | char(1) | ... | char(255) Note: The grammar does not impose any limit on the number of objects. The single object "end_head" marks the end of the active header and the remaining unused header space is undefined. The MADCOW headers include the following fields: Field Type Description - Probable defaults marked in () ----------------------- ------- --------------------------------------------- speaker_id string 3-char. speaker ID from filename speaking_mode string speaking mode ("spontaneous" or "read") recording_date string beginning of recording date stamp of the form DD-MMM-YYYY. Should contain the string "unknown" if this info is not available. recording_time -s11 string beginning of recording time stamp of the form HH:MM:SS.HH. Should contain the string "unknown" if this info is not available. microphone string microphone description ("Sennheiser HMD-410" or "Crown PCC-160") utterance_id string utterance ID from filename of the form XXXUUSMP as described in the filenames section above. database_id string database (corpus) identifier ("atis2") database_version string database (corpus) revision ("1.0") channel_count integer number of channels in waveform ("1") speaker_session_number string 1-char. session ID from filename sample_count integer number of samples in waveform sample_max integer maximum sample value in waveform sample_min integer minimum sample value in waveform sample_rate integer waveform sampling rate ("16000") sample_n_bytes integer number of bytes per sample ("2") sample_byte_format string byte order (MSB/LSB -> "10" or LSB/MSB -> "01") sample_sig_bits integer number of significant bits in each sample ("16") session_utterance_number integer number of utterance within session (base 10) starting at "1" speaker_sentence_number string number of utterance within session (base 36) end_head none end of header identifier In addition to the fields listed above, there are two header entries pertaining to the use of the "shorten" compression algorithm: sample_coding string "pcm,embedded-shorten-v1.09" sample_checksum integer value provided by compression routine -------------------------------------------------------------------- Example ATIS header from SRI data: NIST_1A 1024 database_id -s5 atis2 database_version -s3 1.0 utterance_id -s8 r80062ss channel_count -i 1 sample_count -i 74010 sample_rate -i 16000 sample_min -i -3570 sample_max -i 3856 sample_n_bytes -i 2 sample_byte_format -s2 10 sample_sig_bits -i 16 speaker_id -s3 r80 speaking_mode -s11 spontaneous recording_date -s11 18-Nov-1991 recording_time -s11 14:01:26.00 microphone -s18 Sennheiser HMD-414 speaker_session_number -s1 2 session_utterance_number -i 6 speaker_sentence_number -s2 06 sample_coding -s26 pcm,embedded-shorten-v1.09 sample_checksum -i 11939 end_head -------------------------------------------------------------------- Instructions for obtaining and using SPHERE NIST has developed the SPHERE Library and Utilities package to provide an easy-to-use programming interface and essential command-line operations for manipulating speech files. The ATIS-2 waveform data were prepared for publication using SPHERE version 2.0 "beta". The current release of SPHERE is available for free via anonymous FTP from NIST, as follows: Connect to host: jaguar.ncsl.nist.gov Go to directory: pub Set transfer mode: binary Get file: sphere_2.0_Beta2.tar.Z (Note that the file shown represents the version that is current as of publication of ATIS-2; as subsequent releases are made available, the file name will change accordingly. In general, only one version of SPHERE is present on the ftp server, and that will be the most recent release.) For those who do not have access to the Internet FTP service, the SPHERE package may be obtained for free from: Linguistic Data Consortium 441 Williams Hall University of Pennsylvania Philadelphia, PA 19104 You may also send a request by e-mail to "ldc@unagi.cis.upenn.edu" or call the LDC at (215) 898-0464. After obtaining and installing the SPHERE package, you should refer to the on-line manual pages included with the release for instructions on usage. The relevant utility program for decompressing waveform data is "w_decode".