Index :
This file currently contains two documents specifying the contents of ATIS categorization (.cat) files. The first, numbered "1.0", is an augmented-BNF specification of the syntax; the second, "2.0", is an algorithmic specification in English of the mapping between tags and evaluation classes. Each is delimited by a line of hyphens.
1.0 Specification of .cat file sytax:
; File cat_spec.bnf ; Categorization (.cat) File Contents Specification. ; (Comment lines start with ";") ; BASIC SYNTAX: ; ; Using standard BNF notation extended with these devices: ; "(A)" means "A optionally"; ; "<A>*" means "zero or more A's"; ; "<A>+" means "one or more A's". ; <cat_spec> ::= <eval_class>: <characteristics> ; A .cat file specification is an evaluation class followed by ; a colon and some characteristics, e.g. ; "X: ill-formed" ; Additional constraints between the co-occurrance of the evaluation ; class values (e.g. "X") and the characteristics are stated ; in a different format in file eval_class_proc.txt. <eval_class> ::= A | X | D1 | D <characteristics> ::= <utt_chars> <interp_id_&_chars>* ; the characteristics are a set of whole-utterance characteristics ; followed by zero or more individual interpretation i.d.'s and ; characteristics <utt_chars> ::= <basic_tags>* (<cd-tag>) <basic_tags> ::= arithmetic | bad-db | book | cancelled | disallowed | hopelessly-vague | ill-formed | multi-sentence | presupposition-failure | responding | testably-ambiguous | trunc-utt | uncooperative | unanswerable | underspecified | ungrammatical | wh-question | wizard-error | yes/no ; Note that each aspect of <characteristics> may be null. ; This is allowed only for "class A" utterances (and ; interpretations), so that "A:" is a valid .cat expression, ; but "X:" is not. <cd-tag> ::= context-dependent:<context-pointers>) ; one kind of tag is the context-dependent tag, consisting of ; the phrase "context-dependent" followed by a colon and a set ; of pointers to context, e.g. "D1: context-dependent:Q2". <context-pointers> ::= <ptr_field> | <disjunctive_pointer_field_string> ; the context-pointers field is a disjunctive string of pointer fields <disjunctive_pointer_field_string> ::= <ptr_field> <alternate_ptr_field>+ ; the disjunctive pointer field string is a pointer field followed by ; zero or more alternate pointer fields, e.g. ; "D: context-dependent: Q2 OR Q3". <ptr_field> ::= <basic_ptr> | <conjunctive_pointer_string> ; a pointer field is a conjunctive string of pointers <alternate_ptr_field> ::= OR <ptr_field> ; an alternate pointer field is "OR" followed by a pointer field <conjunctive_pointer_string> ::= <basic_ptr> <additional_basic_ptrs>+ ; a conjunctive string of pointers is a basic pointer followed by ; zero or more additional basic pointers, e.g. ; "D: context-dependent: Q1 & Q2 OR Q3" <basic_ptr> ::= <ctype><UTTNO> (-<interp_id>) | :? | :X <ctype> ::= Q | A | Q/A <additional_basic_ptrs> ::= & <basic_ptr> <interp_id_&_chars> ::= <EOL> <interp_id>:<interp_chars> ; an interpretation i.d. plus characteristics is an interpretation ; i.d. followed by a colon followed by a set of characteristics ; for that interpretation, all on a new line, e.g. ; "A: testably-ambiguous ; interp#1:yes/no ; interp#2:wh-question" <interp_id> ::= interp#<INTEGER> <interp_chars> ::= <interp_tags>* (<cd-tag>) <interp_tags> ::= book | disallowed | presupposition-failure | underspecified | wh-question | yes/no ; only a subset of the whole-utterance tags are allowed on individual ; interpretations; for instance, being ambiguous is a property of ; a whole utterance, not one particular interpretation of an utterance. ; The above formulation takes <UTTNO>, <EOL> and <INTEGER> as primitives. ; <UTTNO> is the number identifing an utterance, as used in the ; name of its .sro file; <EOL> is something that causes a new line ; to begin; and <INTEGER> is any integer.2.0 Mapping Between Tags and Evaluation Classes:
File eval_class_proc.txt
Procedure for assigning queries to evaluation classes:
These rules are to be applied to the utterance characteristics in the .cat file, in order as given. The first one that applies determines the evaluation class of the utterance.
Assign to class X ("X:") if:
arithmetic
bad-db
book
cancelled
disallowed
hopelessly-vague
ill-formed
presupposition-failure
responding
trunc-utt
unanswerable
uncooperative
underspecified
Assign to class D ("D:") if there is one interpretation that is not marked with the tag "context-dependent".
Assign to class D1 ("D1:") if
the context pointer of each of its interpretations specifies just one prior query (":Qn") or query/answer (":Q/An") as context, that prior query ("Qn") is the same for each interpretation, and each query between Qn and this query is tagged as "unanswerable".otherwise assign to class D ("D:").
MADCOW File and Directory Format Specifications
Directory and Filename Structures All MADCOW data are organized into directory and filename structures as follows:
/<CORPUS>/<SPEAKING-MODE>/<SITE>/<SPEAKER>/<SESSION>/<DATA-FILES>
where,
CORPUS ::= atis2This corpus is identified by the database ID (corpus ID) "atis2". This ID appears in the directory structure and in the waveform file headers.
SPEAKING-MODE ::= spon (waveforms) | text (logs, transcripts, etc)
PARTITION :== test | train
SITE :== [feb92 | nov92] | [att | bbn | cmu | mit | nist | sri]
SPEAKER ::= 001 | ... | zzz (3-character base-36 speaker ID)
SESSION ::= 1 | ... | z
DATA-FILES ::= <><XXX><UU><S><M><P>.<TYPE> where,
XXX ::= 001 | ... | zzz (3-character base-36 speaker ID)and,
UU ::= 01 | ... | zz (2-char. base-36 speaker-sentence ID)
S ::= 1 | ... | z (1-char. base-36 session ID)
M ::= s ("s" - spontaneous)
P ::= s | c | x ("s" - Sennheiser, "c"- Crown, or "x" - pertains to all microphones recorded)
Note: Although other ATIS file types do exist, only three of the file types listed above (.log, .wav, .sro) were required as input from sites contributing initial (unannotated) data; the remaining file types (.cat, .win, .sql, .sq2, .ref, and .rf2) were added by the annotation process.TYPE ::= log | (session log file - special speaker-sentence ID of "000" is used in all log files) com | (session comment file - special speaker-sentence ID of "000" is used in all comment files) wav | (SPHERE-headered speech waveform file) sro | ("speech recognizer output" transcription) cat | (query categorization) win | (wizard input to NLParse) sql | (SQL query from NLParse to create min (.ref) answer) sq2 | (SQL query from NLParse to create max (.rf2) answer) ref | (min reference answer from (.sql) SQL query) rf2 | (max reference answer from (.sq2) SQL query)example:
e000e1ss.wav
(speaker e00, utterance 0e, session 1, spontaneous speaking mode, Sennheiser mic., waveform file)Given that speaker e00 was recorded at BBN, and placed in the training partition, the directory path to this file is:
atis2/spon/train/bbn/e00/1/ (this happens to be on disc 12-2.1)
And the corresponding text files would be found in:
atis2/text/train/bbn/e00/1/ (all text data are on disc 12-1.1)
There are separate documentation files explaining the format and contents of some of the file types. In particular, refer to the files cat_spec.doc, log_spec.doc, sro_spec.doc, and wav_spec.doc for information on the .cat, .log, .sro and .wav files, respectively.
This document specifies version 2 of the DARPA/ATIS Common Answer Specification, as revised 11/1/90. The changes are to allow CAS forms with "OR" connecting alternates and to allow certain strings to not be enclosed in quotation marks.
BASIC SYNTAX IN BNF:
<answer> ::= <cas1> | (<cas1> <casn>+) <casn> ::= OR <cas1> <cas1> ::= <scalar-value> | <relation> | NO_ANSWER | no_answer <scalar-value> ::= <boolean-value> | <number-value>| <string> <boolean-value> ::= YES | yes | TRUE | true | NO | no | FALSE | false <number-value> ::= <integer> | <real-number> <integer> ::= <sign> <digit>+ | <digit>+ <sign> ::= + | - <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <real-number> ::= <sign> <digit>+ . <digit>* | <digit>+ . <digit>* <string> ::= <char_except_whitespace>+ | "<char>*" <relation> ::= (<tuple>*) <tuple> ::= (<value>+) <value> ::= <scalar-value> | NILStandard BNF notation has been extended to include two other common devices : "<A>+" means "one or more A's" and "<A>*" means "zero or more A's".
The above formulation does not define <char_except_whitespace> and <char>. All of the standard ASCII characters count as members of <char>, and all but "white space" count as <char_except_whitespace>. Following ANSI "C", blanks, horizontal and vertical tabs, newlines, formfeeds, and comments are, collectively, "white space".
The only change in the syntax of CAS itself from the previous version is that now a string may be represented as either a sequence of characters not containing white space or as a sequence of any characters enclosed in quotation marks. Note that only non-exponential real numbers are allowed, and that empty tuples are not allowed (but empty relations are).
ADDITIONAL SYNTACTIC CONSTRAINTS
The syntactic classes <boolean-value>, &lystring>, and <number-value> define the types "boolean", "string", and "number", respectively. All the tuples in a relation must have the same number of values, and those values must be of the same respective types (boolean, string, or number).
If a token could represent either a string or a number, it will be taken to be a number; if it could represent either a string or a boolean, it will be taken to be a boolean. Interpretation as a string may be forced by enclosing a token in quotation marks.
In a tuple, NIL as the representation of missing data is allowed as a special case for any value, so a legal answer indicating the costs of ground transportation in Boston would be
(("L" 5.00 ) ("R" nil ) ("A" nil ) ("R" nil ))
ELEMENTARY RULES FOR CAS COMPARISONS
String comparison is case-sensitive, but the distinguished values (YES, NO, TRUE, FALSE, NO_ANSWER, and NIL) may be written in either upper or lower case.
Each indexical position for a value in a tuple (say, the ith) is assumed to represent the same field or variable in all the tuples in a given relation.
Answer relations must be derived from the existing relations in the database, either by subsetting and combining relations or by operations like averaging, summation, etc.
In matching an hypothesized (HYP) CAS form with a reference (REF) one, the order of values in the tuples is not important; nor is the order of tuples in a relation, nor the order of alternatives in a CAS form using "OR". The scoring algorithm will use the re-ordering that maximizes the indicated score. Extra values in a tuple are not counted as errors, but distinct extra tuples in a relation are. A tuple is not distinct if its values for the fields specified by the REF cas are the same as another tuple in the relation; these duplicate tuples are ignored.
CAS forms that include alternate CAS's connected with "OR" are intended to allow a single HYP form to match any one of several REF CAS forms. If the HYP CAS form contains alternates, the score is undefined.
In comparing two real number values, a tolerance will be allowed; the default is plus or minus .01%. No tolerance is allowed in the comparison of integers. In comparing two strings, initial and final sub-strings of white space are ignored. In comparing boolean values, "TRUE" and "YES" are equivalent, as are "FALSE" and "NO".
CAS FILES
(NIST Standard ATIS Answer File Specification Ver. 1.0 6/5/90)
The answers to ATIS queries must be in an ASCII text file, each answer in CAS form, sequenced as in the associated "ndx" file. Any material following the first appearance of ";" on a line will be treated as comments. Blank lines will be ignored. The utterance i.d. should be on a comment line immediately preceding the answer CAS for the utterance.
ATIS SR Output (".sro") Transcription Conventions
The transcription is intended to be an orthographic, lexical transcription with a few details included that represent audible acoustic events (speech and nonspeech) present in the corresponding waveform files. The SRO transcriptions will be automatically mapped to lexical SNOR conventions for scoring of recognition systems. The extra marks contained in the SRO transcription aid in interpreting the text form of the utterance. The SRO transcription will be stored in the query's auxiliary file of type ".sro".
The transcriptions are intended to be a quick and broad transcription; transcribers should not have to agonize over decisions, but rather realize that their transcription is intended to be a rough guide that others may examine further for details. Transcriptions should be made in two passes: one pass in which words are transcribed, and a second in which the additional details (extraneous noises, and prosodic marks) are added. Many phenomena (silences, noises, "uh"s) are easy to miss unless specifically attended to. It is recommended that transcribers have some background in phonetics and in linguistics, or that their training and preparation for the transcription task cover some basics in acoustic phonetics and dialect and style variations.
1. Markings Required for Scoring.
1.1 Case
Transcriptions are case insensitive and all case information will be lost in the translation to the all uppercase SNOR conventions. Using all lower case for SRO conventions is recommended so that SRO files are immediately recognizable from SNOR and lexical SNOR files.
1.2 Spelling
Normal lexical items will be represented by their spellings in the normal way. NIST maintains a common lexicon of spellings of words used in the ATIS corpus. It is available via remote FTP to ssi.ncsl.nist.gov and should be consulted when in doubt on spellings of words. The file is located in the directory, "madcow/logs" and is named, "lexicon.doc.DATE", where DATE represents the latest date of update of this file.
Spellings which cannot be predicted from .sro conventions:
- "all right" will always be used in lieu of "alright"
- "traveling" will always be used in lieu of "travelling"
- "trans world" will always be transcribed as separate words when referring to the airline, TWA.
- "pan am" will always be transcribed as separate words when referring to the airline, Pan American Airlines.
- "okay" is spelled "okay" rather than any other spellings, and should not be in angled or square brackets, unless part of a sequence that is verbally deleted.
- hyphenation is addressed in a separate section below.
1.2.1 Number sequences
Number sequences (flight numbers, times, dates, aircraft types, dollar amounts, etc.) will be spelled out to reflect what was said ("flight six one three"; "seven thirty"; "august twenty first"; "seven forty seven"; "four hundred and ten dollars".)
Reminder: No hyphens will be used ("seven forty seven", not "seven forty-seven".)
Note: care should be taken to transcribe the digit "0" as "zero" or "oh", depending on what the speaker said.
1.2.2 Letter sequences
Letter sequences occur in acronyms and abbreviations ("d f w"; "a p slash eighty"; "p m"; "c o"; etc.) Letters should be in lower case, separated by a space. Note that the determiner "a" and the letter "a" in "t w a" are not distinguished in these conventions.
Previous conventions indicated an exception to the above rule for "washington dc" in which there was no space between the "d" and "c". This exception never made sense and has not been used consistently in practice. In all future transcriptions it should NOT be treated as an exception and should always be transcribed as "washington d c".
[NIST has changed all occurrences of "dc" to "d c" in the MADCOW data they have distributed, so the "dc" form has never been used in official MADCOW data. It may, however, exist in the ATIS0 data.]
The AM and PM of times (e.g., "five thirty p m") will be treated as examples of letter sequences, i.e., lower case and separated by a space, with no periods.
If a speaker pronounces as acronym or abbreviation as a word, for example "den" or "bos", then these should be spelled out as words, rather than as "d e n" and "b o s".
1.3 Hyphenation
Hyphens will not generally be used; if the items on either side of a potential hyphen are both words, a space will be used instead of a hyphen. If one or both of the items is NOT a lexical item, neither a space nor a hyphen will be used, e.g., "nonstop" should be used, NOT "non-stop" or "non stop"; "round trip" should be used and NOT "round-trip"; "one way" should be used and NOT "one-way" or "oneway"; "nonsmoking" should be used and NOT "non-smoking".
1.4 Punctuation
This transcription will not contain normal English punctuation and will consist of lowercase characters except for proper nouns and individual letters. Conventional punctuation, including commas, periods, and question marks, will not be used. Periods will be used to indicate silent pauses (see 2.2) within an utterance, and should only occur following a space. Commas are used to indicate intonational separation; exclamation points are used to indicate emphatic stress.
Periods, question marks or exclamation points should NOT be used to indicate the end of a sentence.
1.5 Mispronunciations
Obviously mispronounced words that are nevertheless intelligible will be marked with stars (e.g, *transportation* for ``transportetation''). These include mispronunciations such as words with extra or omitted syllables, but asterisks should not be used to indicate pronunciations of words that represent normal dialectal (e.g., "warshed" for "washed" or "cah" for "car" or stylistic variation (e.g., "bout" for "about" or "wanna" for "want a" or for "want to". If the speaker would not consider the pronunciation an error, the asterisk notation should not be used. Obviously, there may be some clear and some unclear cases; transcribers should use their best judgment. A background in phonetics is helpful for transcribers.
Similarly, glottalization at onset or offset of a vowel are not transcribed.
1.6 Verbal Deletions
Words verbally deleted by the subject will be enclosed in angle brackets. Verbal deletion means words spoken by the user but which, in the opinion of the transcriber, are superseded by subsequent speech explicitly (e.g., "show <flights> <i> <mean> fares") or implicitly (e.g., "show me the <fares> flights to Boston".
Verbal deletions occur any time there is a repetition or restart. In repetitions, one or more words are repeated, and there may or may not be extra material inserted into the repetition, for example:
show me <the> <flights> the flights to bostonIn restarts, words are not repeated, but the speaker changes direction, as in:
show me <the> <flights> the nonstop flights to boston
<show> <me> <the> how many flights go to bostonNote that EACH word in a verbal deletion should be enclosed in angle brackets.
1.7 Word Fragments
Word fragments, i.e. instances in which the speaker did not complete a word, will be marked with a hyphen. As much of the word as is audible will be transcribed, followed immediately by the hyphen:
please show fli- flights from dallasThough these represent "verbal deletions" as described above, the hyphen occurring before (or after) a space is sufficient to cue this fact, and should not be enclosed in angle brackets, as this just adds work for the transcribers. That is, the above example should NOT be "please show <fli-> flights from dallas"
Fragments include cases in which only an initial consonant or vowel is heard:
please show f- flights from dallasThis may sometimes be a judgement call on the part of the transcriber. Within word hesitations may be transcribed as:
dall:as (indicating lengthening of the "l") (see section 2.4)The transcription will specify the intended word if such is obvious to the transcribers and is NOT obvious from context (this is of course a judgement call on the part of the transcriber). The completion of the presumed intended word will be enclosed in parentheses, BEFORE the hyphen, as in:
dal- [um] -las (indicating a within word interruption - rare)
dal- . -as (indicating a silence interrupting a word)
please show flights1.8 Non-Speech Acoustic Eventsde(nver)- from dallas
Acoustic events enclosed in square brackets can come from the following set:
Note that any term can be used inside the brackets, but there should be no spaces inside brackets; use an underscore to connect words.
Note that the the filled pauses represent acoustic events similar acoustically and phonetically to speech. If possible, try to limit these to the set on the list, so that those interested in these events can find them easily. If others occur, contact the MADCOW committee via your MADCOW representative.
For noise events that occur over a span of one or more words, the transcriber should:
"show the [door_slam>] flights to boston
or "show the flights [<door_slam] to boston
[chair_squeak] [cough] [cross_talk] [door_slam] [grunt] [laughter] [lip_smack] (use ONLY if EXCEPTIONALLY loud!) [loud_breath] (do NOT mark audible but low-level breath noises) [paper_rustle] [phone_ring] [sigh] (only if the amplitude is comparable to surrounding speech) [throat_clear] [tongue_click] (use ONLY if EXCEPTIONALLY loud!) [unintelligible] [sniff] [tap] [noise](The following speech sounds are also transcribed in the CSR and in the .sro transcriptions)
[er] [mm] [uh] [um](The following speech style markers are used in CSR, and considered optional in the .sro transcriptions).
[loud] [soft] [whisper]Note: Acoustic events such as inhalation, exhalation, tongue clicks, lip smacks, and breath noise will not be transcribed if they are low level and non-intrusive.
2. Markings Helpful for Interpretation.
These markings should be used when salient; transcribers should not assume that they are optional. However, the transcriber should not agonize over these decisions. If in doubt, leave it out. The transcription is basically at the lexical level, and should be done relatively quickly. The following are intended to be helpful markings that should be used when the phenomena are very clear.
2.1 Intonational Boundaries
A comma will be used to indicate an intonational separation. It is preceded and followed by a space.
i'd like to fly on delta , first class , july secondAn intonational separation may be achieved by:
"show me the flights to boston: what are their fares"
(if there is neither a pause nor intonational marking, but
there is lengthening)
"show me the flights to boston what are their fares"
(ONLY if there is no lengthening, pause or intonational
indication of the separation between the two sentences).
2.2 Silent Pauses
Silent pauses will be marked with a period (``.''). The use of the period indicates a significant silence, i.e., one that is clearly noticeable by listening, and which is significantly longer than a silence associated with a stop consonant closure for the rate of speech used by the speaker. Example:
show me the . flights to bostonPrevious SRO conventions dictated that "." be used for a one-second pause, ". ." for a two second pause, etc. This is no longer in effect: a "." may be used to indicate a significant duration of silence, without giving further information on its duration. This was hard for transcribers to do, was inconsistently applied, and is more appropriately done by automatic methods. Thus in the above example, the silence could be 400 ms or one minute, for example.
2.3 Emphatic Stress
An exclamation mark (``!'') before a word or syllable indicates emphatic stress. This includes stress beyond what might normally occur based on lexical and syntactic factors. This is used sparingly and subjectively. Note that the "!" only precedes a word. Example:
show me only !delta flights2.4 Lengthening
Lengthening, typically vowel lengthening, will be indicated by a colon (``:'') placed immediately after the lengthened sound. This is used sparingly and subjectively. Note that ":" always follows some sound; if it occurs within a word, it is not followed by a space. Examples:
show me the: flights to bostonLengthenings before silences are so often observed that hearing them is difficult and would make the transcribers job much more difficult than it is intended to be. They therefore need not be marked before the end of the utterance or before a transcribed silence.
which flights ha:ve economy fares
3. Truncated Waveforms 3.1 Marking of transcription
If a .wav file is truncated due to a recording error by the system or by the failure of the subject to press/depress the push-to-talk button at the proper times, the following notation in the corresponding .sro file is to be used:
~ transcription
transcription ~
~ transcription ~
~~
* If the wizard responded to a totally truncated utterance with
an error message, and this "empty" interchange is retained
in the .log file then the .sro transcription should consist of a
blank new-line and NOT a "~~". The utterance will then not
be annotated as a "trunc-utt". The purpose of this is to
distinguish those cases where dialogue coherence has been
maintained, from those cases where the system may have
gotten out of sync with what has been recorded in the .wav file.
However, the transcribers are typically not looking at the .log files, and hence do not know what the wizard did. Sites that still produce truncate utterances are strongly encouraged to correct the data collection mechanism to avoid this problem. In the meantime, transcribers at these sites may have to consult the .log files for resolution of some instances.
For cases in which the user pushed the button and then said nothing, the corresponding .sro file should be a blank line, with no indication of truncation.
Speech style is considered a level of detail that need not be included in the SRO transcriptions. However, those sites who want to include it should use the conventions for these markers that are described in the documentation for the .dot files for the Wall Street Journal task. (See section 1.8).
5. Autocompletion
Autocompletion files, in conjunction with gnuemacs tools can greatly increase the transcriber's efficiency. SRI does this via a file that can be maintained and updated by the transcriber, and can be obtained by requesting this software from SRI, via your MADCOW representative.
MADCOW Speech Waveform (.wav) File Type Specifications
ATIS MADCOW speech waveform files have been formatted using the NIST SPHERE header structure. They are stored on cd-rom in compressed form, using a version of Toni Robinson's "shorten" algorithm for waveform data compression. Source code (in "C") for the SPHERE Library and Utilities is available via anonymous ftp from NIST (see below for instructions on downloading the software). Users without access to Internet ftp file transfers may contact the Linguistic Data Consortium to obtain the source code by mail (see instructions at the end of this file).
The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. The fixed portion is as follows:
NIST_1A<new-line>
1024<new-line>
The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identify the header as well as allow those who do not wish to read the subsequent header information to programmatically skip over it.
The remaining object-oriented variable portion is composed of object-type-value "triple" lines which have the following format:
<LINE> ::= <TRIPLE><new-line> | <COMMENT><new-line> | <TRIPLE><COMMENT><new-line> <TRIPLE> ::= <OBJECT><space><TYPE><space><VALUE><OPT-SPACES> <OBJECT> ::= <PRIMARY-SUBOBJECT> | <PRIMARY-SUBOBJECT><SECONDARY-SUBOBJECT> <PRIMARY-SUBOBJECT> ::= <ALPHA> | <ALPHA><ALPHA-NUM-STRING> <SECONDARY-SUBOBJECT> ::= _<ALPHA-NUM-STRING> | _<ALPHA-NUM-STRING><SECONDARY-SUBOBJECT> <TYPE> ::= -<INTEGER-FLAG> | -<REAL-FLAG> | -<STRING-FLAG> <INTEGER-FLAG> ::= i <REAL-FLAG> ::= r <STRING-FLAG> ::= s<DIGIT-STRING> <VALUE> ::= <INTEGER> | <REAL> | <STRING> (depending on object type) <INTEGER> ::= <SIGN><DIGIT-STRING> <REAL> ::= <SIGN><DIGIT-STRING>.<DIGIT-STRING> <OPT-SPACES> ::= <SPACES> | NULL <COMMENT> ::= ;<STRING> (excluding embedded new-lines) <ALPHA-NUM-STRING> ::= <ALPHA-NUM> | <ALPHA-NUM><ALPHA-NUM-STRING> <ALPHA-NUM> ::= <DIGIT> | <ALPHA> <ALPHA> ::= a | ... | z | A | ... | Z <DIGIT-STRING> ::= <DIGIT> | <DIGIT><DIGIT-STRING> <DIGIT> ::= 0 | ... | 9 <SIGN> ::= + | - | NULL <SPACES> ::= <space> | <SPACES><space> <STRING> ::= <CHARACTER> | <CHARACTER><STRING> <CHARACTER> ::= char(0) | char(1) | ... | char(255)Note: The grammar does not impose any limit on the number of objects.
The single object "end_head" marks the end of the active header and the remaining unused header space is undefined.
The MADCOW headers include the following fields:
Field Type Description - Probable defaults marked in () ----------------------- ------- --------------------------------------------- speaker_id string 3-char. speaker ID from filename speaking_mode string speaking mode ("spontaneous" or "read") recording_date string beginning of recording date stamp of the form DD-MMM-YYYY. Should contain the string "unknown" if this info is not available. recording_time -s11 string beginning of recording time stamp of the form HH:MM:SS.HH. Should contain the string "unknown" if this info is not available. microphone string microphone description ("Sennheiser HMD-410" or "Crown PCC-160") utterance_id string utterance ID from filename of the form XXXUUSMP as described in the filenames section above. database_id string database (corpus) identifier ("atis2") database_version string database (corpus) revision ("1.0") channel_count integer number of channels in waveform ("1") speaker_session_number string 1-char. session ID from filename sample_count integer number of samples in waveform sample_max integer maximum sample value in waveform sample_min integer minimum sample value in waveform sample_rate integer waveform sampling rate ("16000") sample_n_bytes integer number of bytes per sample ("2") sample_byte_format string byte order (MSB/LSB -> "10" or LSB/MSB -> "01") sample_sig_bits integer number of significant bits in each sample ("16") session_utterance_number integer number of utterance within session (base 10) starting at "1" speaker_sentence_number string number of utterance within session (base 36) end_head none end of header identifierIn addition to the fields listed above, there are two header entries pertaining to the use of the "shorten" compression algorithm:
sample_coding string "pcm,embedded-shorten-v1.09" sample_checksum integer value provided by compression routineExample ATIS header from SRI data:
NIST_1A 1024 database_id -s5 atis2 database_version -s3 1.0 utterance_id -s8 r80062ss channel_count -i 1 sample_count -i 74010 sample_rate -i 16000 sample_min -i -3570 sample_max -i 3856 sample_n_bytes -i 2 sample_byte_format -s2 10 sample_sig_bits -i 16 speaker_id -s3 r80 speaking_mode -s11 spontaneous recording_date -s11 18-Nov-1991 recording_time -s11 14:01:26.00 microphone -s18 Sennheiser HMD-414 speaker_session_number -s1 2 session_utterance_number -i 6 speaker_sentence_number -s2 06 sample_coding -s26 pcm,embedded-shorten-v1.09 sample_checksum -i 11939 end_headInstructions for obtaining and using SPHERE
NIST has developed the SPHERE Library and Utilities package to provide an easy-to-use programming interface and essential command-line operations for manipulating speech files. The ATIS-2 waveform data were prepared for publication using SPHERE version 2.0 "beta". The current release of SPHERE is available for free via anonymous FTP from NIST, as follows:
Connect to host: jaguar.ncsl.nist.gov Go to directory: pub Set transfer mode: binary Get file: sphere_2.0_Beta2.tar.Z(Note that the file shown represents the version that is current as of publication of ATIS-2; as subsequent releases are made available, the file name will change accordingly. In general, only one version of SPHERE is present on the ftp server, and that will be the most recent release.)
For those who do not have access to the Internet FTP service, the SPHERE package may be obtained for free from:
Linguistic Data Consortium 441 Williams Hall University of Pennsylvania Philadelphia, PA 19104You may also send a request by e-mail to "ldc@unagi.cis.upenn.edu" or call the LDC at (215) 898-0464.
After obtaining and installing the SPHERE package, you should refer to the on-line manual pages included with the release for instructions on usage. The relevant utility program for decompressing waveform data is "w_decode".