File: filename-specs.doc, updated 04/15/92 (modified 10/07/93 for cdrom publication) MADCOW File and Directory Format Specifications Directory and Filename Structures --------------------------------- All MADCOW data are organized into directory and filename structures as follows: ////// where, CORPUS ::= atis2 SPEAKING-MODE ::= spon (waveforms) | text (logs, transcripts, etc) PARTITION :== test | train SITE :== [feb92 | nov92] | [att | bbn | cmu | mit | nist | sri] SPEAKER ::= 001 | ... | zzz (3-character base-36 speaker ID) SESSION ::= 1 | ... | z DATA-FILES ::=

. where, XXX ::= 001 | ... | zzz (3-character base-36 speaker ID) UU ::= 01 | ... | zz (2-char. base-36 speaker-sentence ID) S ::= 1 | ... | z (1-char. base-36 session ID) M ::= s ("s" - spontaneous) P ::= s | c | x ("s" - Sennheiser, "c"- Crown, or "x" - pertains to all microphones recorded) and, TYPE ::= log | (session log file - special speaker-sentence ID of "000" is used in all log files) com | (session comment file - special speaker-sentence ID of "000" is used in all comment files) wav | (SPHERE-headered speech waveform file) sro | ("speech recognizer output" transcription) cat | (query categorization) win | (wizard input to NLParse) sql | (SQL query from NLParse to create min (.ref) answer) sq2 | (SQL query from NLParse to create max (.rf2) answer) ref | (min reference answer from (.sql) SQL query) rf2 | (max reference answer from (.sq2) SQL query) Note: Although other ATIS file types do exist, only three of the file types listed above (.log, .wav, .sro) were required as input from sites contributing initial (unannotated) data; the remaining file types (.cat, .win, .sql, .sq2, .ref, and .rf2) were added by the annotation process. example: e000e1ss.wav (speaker e00, utterance 0e, session 1, spontaneous speaking mode, Sennheiser mic., waveform file) Given that speaker e00 was recorded at BBN, and placed in the training partition, the directory path to this file is: atis2/spon/train/bbn/e00/1/ (this happens to be on disc 12-2.1) And the corresponding text files would be found in: atis2/text/train/bbn/e00/1/ (all text data are on disc 12-1.1) This corpus is identified by the database ID (corpus ID) "atis2". This ID appears in the directory structure and in the waveform file headers. There are separate documentation files explaining the format and contents of some of the file types. In particular, refer to the files cat_spec.doc, log_spec.doc, sro_spec.doc, and wav_spec.doc for information on the .cat, .log, .sro and .wav files, respectively.