File: fil_spec.doc, updated 07/20/94

* Note that the directory naming conventions are slightly different for the CD-ROM distribution. However, the filenaming conventions have not changed.

MADCOW File and Directory Format Specifications for ATIS3

Directory and Filename Structures

All MADCOW data should be organized into the prescribed directory and filename structures as follows:

/<CORPUS>/doc/<DOCFILES>

where,

     DOCFILES ::= readme.doc | (optional general information file
                  spkrinfo.log | (mandatory speaker information formatted
                                 according to "atis-spkr-info.log")
- OR -

/<CORPUS>/<SPEAKING-MODE>/<SPEAKER>/<SESSION>/<DATA-FILES>

where,

CORPUS ::= atis3
SPEAKING-MODE ::= spon | vspn | read
SPEAKER ::= 001 | ... | zzz (3-character base-36 speaker ID)
SESSION ::= 1 | ... | z (1-character base-36 scenario session ID
DATA-FILES ::= <XXX><UU><S><M><P>.<TYPE>

where,

XXX ::= 001 | ... | zzz (3-character base-36 speaker ID)
UU ::= 01 | ... | zz (2-char. base-36 within-scenario-session query ID)
S ::= 1 | ... | z (1-char. base-36 scenario-session ID)
M ::= s | r | c (speaking mode:
"s" - spontaneous or
"r" - read version of spontaneous or
"c" - read common or
"v" - voice-only spontaneous)
P ::= s | c | x (microphone:
"s" - Sennheiser,
"c"- Crown,
"x" - pertains to all microphones recorded)
and,
TYPE ::= log | (session log file - special within-scenario- session query ID of "00" is used in all log files)
wav | (SPHERE-headered speech waveform file)
sro | ("speech recognizer output" transcription)
lsn | (lexical SNOR transcription derived from .sro)
cat | (query categorization)
win | (wizard input to NLParse)
sql | (SQL query from NLParse to create min (.ref) answer)
sq2 | (SQL query from NLParse to create max (.rf2) answer)
ref | (min reference answer from (.sql) SQL query)
rf2 | (max reference answer from (.sq2) SQL query)
squ | (subject questionnaire)
com | (session comment file - special within-scenario- session query ID of "00" is used in all comment files)
Note: Although other ATIS file types do exist, only three of the file types listed above (.log, .wav, .sro) are required as input from sites contributing initial (unannotated) data. Also note that some of the file types above (.cat, .win, .sql, .sq2, .ref, and .rf2) are added by the annotation process. The .lsn files are added at NIST and are used as input to NL-only systems and for scoring SPREC results.

example.

b000e1ss.wav
(speaker b00, query 0e, scenario-session 1, spontaneous speaking mode, Sennheiser mic., waveform file)
Note: The MADCOW ATIS3 corpus will be identified by the database ID (corpus ID) "atis3". This ID should appear in the directory structure and in the waveform file headers.