ATIS2 *_spec.doc files.

ATIS2 `*_spec.doc` files.

Index :

cat_spec.doc
dir_spec.doc
ref_spec.doc
sro_spec.doc
wav_spec.doc

File cat-specs.doc.910826, last modified 8/26/91

This file currently contains two documents specifying the contents of ATIS categorization (.cat) files. The first, numbered "1.0", is an augmented-BNF specification of the syntax; the second, "2.0", is an algorithmic specification in English of the mapping between tags and evaluation classes. Each is delimited by a line of hyphens.

1.0 Specification of .cat file sytax:


; File cat_spec.bnf
; Categorization (.cat) File Contents Specification.
; (Comment lines start with ";")

; BASIC SYNTAX:
;
; Using standard BNF notation extended with these devices:
; "(A)" means "A optionally";
; "<A>*" means "zero or more A's";
; "<A>+" means "one or more A's".
; 
 <cat_spec> ::= <eval_class>: <characteristics>
; A .cat file specification is an evaluation class followed by
; a colon and some characteristics, e.g.
;  "X: ill-formed"
; Additional constraints between the co-occurrance of the evaluation
; class values (e.g. "X") and the characteristics are stated
; in a different format in file eval_class_proc.txt.


    <eval_class> ::= A | X | D1 | D

    <characteristics> ::= <utt_chars> <interp_id_&_chars>*
; the characteristics are a set of whole-utterance characteristics
; followed by zero or more individual interpretation i.d.'s and
; characteristics

       <utt_chars> ::= <basic_tags>* (<cd-tag>)
          <basic_tags> ::= arithmetic | bad-db | book | cancelled
              | disallowed | hopelessly-vague | ill-formed
              | multi-sentence | presupposition-failure
              | responding | testably-ambiguous | trunc-utt
              | uncooperative | unanswerable | underspecified
              | ungrammatical | wh-question | wizard-error | yes/no

; Note that each aspect of <characteristics> may be null.
; This is allowed only for "class A" utterances (and
; interpretations), so that "A:" is a valid .cat expression,
; but "X:" is not.


          <cd-tag> ::= context-dependent:<context-pointers>)
; one kind of tag is the context-dependent tag, consisting of
; the phrase "context-dependent" followed by a colon and a set
; of pointers to context, e.g. "D1: context-dependent:Q2".

<context-pointers> ::= <ptr_field> | <disjunctive_pointer_field_string>
; the context-pointers field is a disjunctive string of pointer fields

<disjunctive_pointer_field_string> ::= <ptr_field> <alternate_ptr_field>+
; the disjunctive pointer field string is a pointer field followed by
; zero or more alternate pointer fields, e.g.
; "D: context-dependent: Q2 OR Q3".

<ptr_field> ::= <basic_ptr> | <conjunctive_pointer_string>
; a pointer field is a conjunctive string of pointers

 <alternate_ptr_field> ::= OR <ptr_field>
; an alternate pointer field is "OR" followed by a pointer field

 <conjunctive_pointer_string> ::= <basic_ptr> <additional_basic_ptrs>+
; a conjunctive string of pointers is a basic pointer followed by
; zero or more additional basic pointers, e.g.
; "D: context-dependent: Q1 & Q2 OR Q3"

     <basic_ptr> ::= <ctype><UTTNO> (-<interp_id>) | :? | :X
         <ctype> ::= Q | A | Q/A 
     <additional_basic_ptrs> ::= & <basic_ptr>


<interp_id_&_chars> ::= <EOL> <interp_id>:<interp_chars>
; an interpretation i.d. plus characteristics is an interpretation
; i.d. followed by a colon followed by a set of characteristics
; for that interpretation, all on a new line, e.g.
; "A: testably-ambiguous
;    interp#1:yes/no
;    interp#2:wh-question"

<interp_id> ::= interp#<INTEGER>
<interp_chars> ::= <interp_tags>* (<cd-tag>)
<interp_tags> ::= book | disallowed | presupposition-failure
                | underspecified | wh-question | yes/no
; only a subset of the whole-utterance tags are allowed on individual
; interpretations; for instance, being ambiguous is a property of
; a whole utterance, not one particular interpretation of an utterance.

; The above formulation takes <UTTNO>, <EOL> and <INTEGER> as primitives.
; <UTTNO> is the number identifing an utterance, as used in the
; name of its .sro file; <EOL> is something that causes a new line
; to begin; and <INTEGER> is any integer.

2.0 Mapping Between Tags and Evaluation Classes:

File eval_class_proc.txt

Procedure for assigning queries to evaluation classes:

These rules are to be applied to the utterance characteristics in the .cat file, in order as given. The first one that applies determines the evaluation class of the utterance.

Assign to class X ("X:") if:

There are more than 6 individual interpretations.
Any of these tags occur on the utterance or any of its interpretations:
arithmetic
bad-db
book
cancelled
disallowed
hopelessly-vague
ill-formed
presupposition-failure
responding
trunc-utt
unanswerable
uncooperative
underspecified

Assign to class A ("A:") if the tag "context-dependent" does not occur on the utterance or on any of its interpretations.

Assign to class D ("D:") if there is one interpretation that is not marked with the tag "context-dependent".

Assign to class D1 ("D1:") if

the context pointer of each of its interpretations specifies just one prior query (":Qn") or query/answer (":Q/An") as context, that prior query ("Qn") is the same for each interpretation, and each query between Qn and this query is tagged as "unanswerable".

otherwise assign to class D ("D:").

File: filename-specs.doc, updated 04/15/92 (modified 10/07/93 for cdrom publication)

MADCOW File and Directory Format Specifications

Directory and Filename Structures All MADCOW data are organized into directory and filename structures as follows:

/<CORPUS>/<SPEAKING-MODE>/<SITE>/<SPEAKER>/<SESSION>/<DATA-FILES>

where,

CORPUS ::= atis2 SPEAKING-MODE ::= spon (waveforms) | text (logs, transcripts, etc) PARTITION :== test | train SITE :== [feb92 | nov92] | [att | bbn | cmu | mit | nist | sri] SPEAKER ::= 001 | ... | zzz (3-character base-36 speaker ID) SESSION ::= 1 | ... | z DATA-FILES ::= <><XXX><UU><S><M><P>.<TYPE> where,

XXX ::= 001 | ... | zzz (3-character base-36 speaker ID) UU ::= 01 | ... | zz (2-char. base-36 speaker-sentence ID) S ::= 1 | ... | z (1-char. base-36 session ID) M ::= s ("s" - spontaneous) P ::= s | c | x ("s" - Sennheiser, "c"- Crown, or "x" - pertains to all microphones recorded)
and,
TYPE ::= log | (session log file - special speaker-sentence ID
                of "000" is used in all log files) 
         com | (session comment file - special speaker-sentence
                ID of "000" is used in all comment files) 
         wav | (SPHERE-headered speech waveform file)
         sro | ("speech recognizer output" transcription)
         cat | (query categorization)
         win | (wizard input to NLParse)
         sql | (SQL query from NLParse to create min (.ref) 
                answer)
         sq2 | (SQL query from NLParse to create max (.rf2) 
               answer)
         ref | (min reference answer from (.sql) SQL query)
         rf2 | (max reference answer from (.sq2) SQL query)
Note: Although other ATIS file types do exist, only three of the file types listed above (.log, .wav, .sro) were required as input from sites contributing initial (unannotated) data; the remaining file types (.cat, .win, .sql, .sq2, .ref, and .rf2) were added by the annotation process.
example:
e000e1ss.wav
(speaker e00, utterance 0e, session 1, spontaneous speaking mode, Sennheiser mic., waveform file)
Given that speaker e00 was recorded at BBN, and placed in the training partition, the directory path to this file is:
atis2/spon/train/bbn/e00/1/ (this happens to be on disc 12-2.1)
And the corresponding text files would be found in:
atis2/text/train/bbn/e00/1/ (all text data are on disc 12-1.1)

This corpus is identified by the database ID (corpus ID) "atis2". This ID appears in the directory structure and in the waveform file headers.

There are separate documentation files explaining the format and contents of some of the file types. In particular, refer to the files cat_spec.doc, log_spec.doc, sro_spec.doc, and wav_spec.doc for information on the .cat, .log, .sro and .wav files, respectively.

File cas_spec.doc.910731 Last Modified 7/31/91

This document specifies version 2 of the DARPA/ATIS Common Answer Specification, as revised 11/1/90. The changes are to allow CAS forms with "OR" connecting alternates and to allow certain strings to not be enclosed in quotation marks.

BASIC SYNTAX IN BNF:

       <answer>  ::=  <cas1> | (<cas1> <casn>+)
         <casn>  ::=  OR <cas1>
         <cas1>  ::=  <scalar-value> | <relation> | NO_ANSWER | no_answer
  <scalar-value> ::= <boolean-value> | <number-value>| <string>
 <boolean-value> ::=  YES | yes | TRUE | true | NO | no | FALSE | false
  <number-value> ::= <integer> | <real-number>
      <integer>  ::= <sign> <digit>+ | <digit>+ 
          <sign> ::= + | -
         <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
   <real-number> ::= <sign> <digit>+ . <digit>* | <digit>+ . <digit>*
        <string> ::= <char_except_whitespace>+ | "<char>*"
      <relation> ::= (<tuple>*)
         <tuple> ::= (<value>+)
         <value> ::= <scalar-value> | NIL

Standard BNF notation has been extended to include two other common devices : "<A>+" means "one or more A's" and "<A>*" means "zero or more A's".

The above formulation does not define <char_except_whitespace> and <char>. All of the standard ASCII characters count as members of <char>, and all but "white space" count as <char_except_whitespace>. Following ANSI "C", blanks, horizontal and vertical tabs, newlines, formfeeds, and comments are, collectively, "white space".

The only change in the syntax of CAS itself from the previous version is that now a string may be represented as either a sequence of characters not containing white space or as a sequence of any characters enclosed in quotation marks. Note that only non-exponential real numbers are allowed, and that empty tuples are not allowed (but empty relations are).

ADDITIONAL SYNTACTIC CONSTRAINTS

The syntactic classes <boolean-value>, &lystring>, and <number-value> define the types "boolean", "string", and "number", respectively. All the tuples in a relation must have the same number of values, and those values must be of the same respective types (boolean, string, or number).

If a token could represent either a string or a number, it will be taken to be a number; if it could represent either a string or a boolean, it will be taken to be a boolean. Interpretation as a string may be forced by enclosing a token in quotation marks.

In a tuple, NIL as the representation of missing data is allowed as a special case for any value, so a legal answer indicating the costs of ground transportation in Boston would be

(("L" 5.00 ) ("R" nil ) ("A" nil ) ("R" nil ))

ELEMENTARY RULES FOR CAS COMPARISONS

String comparison is case-sensitive, but the distinguished values (YES, NO, TRUE, FALSE, NO_ANSWER, and NIL) may be written in either upper or lower case.

Each indexical position for a value in a tuple (say, the ith) is assumed to represent the same field or variable in all the tuples in a given relation.

Answer relations must be derived from the existing relations in the database, either by subsetting and combining relations or by operations like averaging, summation, etc.

In matching an hypothesized (HYP) CAS form with a reference (REF) one, the order of values in the tuples is not important; nor is the order of tuples in a relation, nor the order of alternatives in a CAS form using "OR". The scoring algorithm will use the re-ordering that maximizes the indicated score. Extra values in a tuple are not counted as errors, but distinct extra tuples in a relation are. A tuple is not distinct if its values for the fields specified by the REF cas are the same as another tuple in the relation; these duplicate tuples are ignored.

CAS forms that include alternate CAS's connected with "OR" are intended to allow a single HYP form to match any one of several REF CAS forms. If the HYP CAS form contains alternates, the score is undefined.

In comparing two real number values, a tolerance will be allowed; the default is plus or minus .01%. No tolerance is allowed in the comparison of integers. In comparing two strings, initial and final sub-strings of white space are ignored. In comparing boolean values, "TRUE" and "YES" are equivalent, as are "FALSE" and "NO".

CAS FILES

(NIST Standard ATIS Answer File Specification Ver. 1.0 6/5/90)

The answers to ATIS queries must be in an ASCII text file, each answer in CAS form, sequenced as in the associated "ndx" file. Any material following the first appearance of ";" on a line will be treated as comments. Blank lines will be ignored. The utterance i.d. should be on a comment line immediately preceding the answer CAS for the utterance.

File sro-specs.doc.

originally drawn from a memo by C. Hemphill of TI (4/18/90).
amended 07/91.
revised by L. Shriberg (11/10/91).
revised by Patti Price (12/09/91),
revised 01/21/92.
minor revisions made by J. Garofolo on 02/21/92.

Please note: in the transcription data provided in this publication, no effort has been made to enforce consistency in the prosodic and non-speech markings as described below. Such markings are known to be somewhat subjective, and there may have been some evolutionary changes in usage over the period of data collection for this corpus. These issues, however, do not affect the lexical or referential content of the transcriptions.

ATIS SR Output (".sro") Transcription Conventions

The transcription is intended to be an orthographic, lexical transcription with a few details included that represent audible acoustic events (speech and nonspeech) present in the corresponding waveform files. The SRO transcriptions will be automatically mapped to lexical SNOR conventions for scoring of recognition systems. The extra marks contained in the SRO transcription aid in interpreting the text form of the utterance. The SRO transcription will be stored in the query's auxiliary file of type ".sro".

The transcriptions are intended to be a quick and broad transcription; transcribers should not have to agonize over decisions, but rather realize that their transcription is intended to be a rough guide that others may examine further for details. Transcriptions should be made in two passes: one pass in which words are transcribed, and a second in which the additional details (extraneous noises, and prosodic marks) are added. Many phenomena (silences, noises, "uh"s) are easy to miss unless specifically attended to. It is recommended that transcribers have some background in phonetics and in linguistics, or that their training and preparation for the transcription task cover some basics in acoustic phonetics and dialect and style variations.

1. Markings Required for Scoring.

1.1 Case

Transcriptions are case insensitive and all case information will be lost in the translation to the all uppercase SNOR conventions. Using all lower case for SRO conventions is recommended so that SRO files are immediately recognizable from SNOR and lexical SNOR files.

1.2 Spelling

Normal lexical items will be represented by their spellings in the normal way. NIST maintains a common lexicon of spellings of words used in the ATIS corpus. It is available via remote FTP to ssi.ncsl.nist.gov and should be consulted when in doubt on spellings of words. The file is located in the directory, "madcow/logs" and is named, "lexicon.doc.DATE", where DATE represents the latest date of update of this file.

Spellings which cannot be predicted from .sro conventions:

- "all right" will always be used in lieu of "alright"

- "traveling" will always be used in lieu of "travelling"

- "trans world" will always be transcribed as separate words when referring to the airline, TWA.

- "pan am" will always be transcribed as separate words when referring to the airline, Pan American Airlines.

- "okay" is spelled "okay" rather than any other spellings, and should not be in angled or square brackets, unless part of a sequence that is verbally deleted.

- hyphenation is addressed in a separate section below.

1.2.1 Number sequences

Number sequences (flight numbers, times, dates, aircraft types, dollar amounts, etc.) will be spelled out to reflect what was said ("flight six one three"; "seven thirty"; "august twenty first"; "seven forty seven"; "four hundred and ten dollars".)

Reminder: No hyphens will be used ("seven forty seven", not "seven forty-seven".)

Note: care should be taken to transcribe the digit "0" as "zero" or "oh", depending on what the speaker said.

1.2.2 Letter sequences

Letter sequences occur in acronyms and abbreviations ("d f w"; "a p slash eighty"; "p m"; "c o"; etc.) Letters should be in lower case, separated by a space. Note that the determiner "a" and the letter "a" in "t w a" are not distinguished in these conventions.

Previous conventions indicated an exception to the above rule for "washington dc" in which there was no space between the "d" and "c". This exception never made sense and has not been used consistently in practice. In all future transcriptions it should NOT be treated as an exception and should always be transcribed as "washington d c".

[NIST has changed all occurrences of "dc" to "d c" in the MADCOW data they have distributed, so the "dc" form has never been used in official MADCOW data. It may, however, exist in the ATIS0 data.]

The AM and PM of times (e.g., "five thirty p m") will be treated as examples of letter sequences, i.e., lower case and separated by a space, with no periods.

If a speaker pronounces as acronym or abbreviation as a word, for example "den" or "bos", then these should be spelled out as words, rather than as "d e n" and "b o s".

1.3 Hyphenation

Hyphens will not generally be used; if the items on either side of a potential hyphen are both words, a space will be used instead of a hyphen. If one or both of the items is NOT a lexical item, neither a space nor a hyphen will be used, e.g., "nonstop" should be used, NOT "non-stop" or "non stop"; "round trip" should be used and NOT "round-trip"; "one way" should be used and NOT "one-way" or "oneway"; "nonsmoking" should be used and NOT "non-smoking".

1.4 Punctuation

This transcription will not contain normal English punctuation and will consist of lowercase characters except for proper nouns and individual letters. Conventional punctuation, including commas, periods, and question marks, will not be used. Periods will be used to indicate silent pauses (see 2.2) within an utterance, and should only occur following a space. Commas are used to indicate intonational separation; exclamation points are used to indicate emphatic stress.

Periods, question marks or exclamation points should NOT be used to indicate the end of a sentence.

1.5 Mispronunciations

Obviously mispronounced words that are nevertheless intelligible will be marked with stars (e.g, *transportation* for ``transportetation''). These include mispronunciations such as words with extra or omitted syllables, but asterisks should not be used to indicate pronunciations of words that represent normal dialectal (e.g., "warshed" for "washed" or "cah" for "car" or stylistic variation (e.g., "bout" for "about" or "wanna" for "want a" or for "want to". If the speaker would not consider the pronunciation an error, the asterisk notation should not be used. Obviously, there may be some clear and some unclear cases; transcribers should use their best judgment. A background in phonetics is helpful for transcribers.

Similarly, glottalization at onset or offset of a vowel are not transcribed.

1.6 Verbal Deletions

Words verbally deleted by the subject will be enclosed in angle brackets. Verbal deletion means words spoken by the user but which, in the opinion of the transcriber, are superseded by subsequent speech explicitly (e.g., "show <flights> <i> <mean> fares") or implicitly (e.g., "show me the <fares> flights to Boston".

Verbal deletions occur any time there is a repetition or restart. In repetitions, one or more words are repeated, and there may or may not be extra material inserted into the repetition, for example:

show me <the> <flights> the flights to boston
show me <the> <flights> the nonstop flights to boston

In restarts, words are not repeated, but the speaker changes direction, as in:

<show> <me> <the> how many flights go to boston

Note that EACH word in a verbal deletion should be enclosed in angle brackets.

1.7 Word Fragments

Word fragments, i.e. instances in which the speaker did not complete a word, will be marked with a hyphen. As much of the word as is audible will be transcribed, followed immediately by the hyphen:

please show fli- flights from dallas

Though these represent "verbal deletions" as described above, the hyphen occurring before (or after) a space is sufficient to cue this fact, and should not be enclosed in angle brackets, as this just adds work for the transcribers. That is, the above example should NOT be "please show <fli-> flights from dallas"

Fragments include cases in which only an initial consonant or vowel is heard:

please show f- flights from dallas

This may sometimes be a judgement call on the part of the transcriber. Within word hesitations may be transcribed as:

dall:as (indicating lengthening of the "l") (see section 2.4)
dal- [um] -las (indicating a within word interruption - rare)
dal- . -as (indicating a silence interrupting a word)

The transcription will specify the intended word if such is obvious to the transcribers and is NOT obvious from context (this is of course a judgement call on the part of the transcriber). The completion of the presumed intended word will be enclosed in parentheses, BEFORE the hyphen, as in:

please show flights de(nver)- from dallas

1.8 Non-Speech Acoustic Events

Acoustic events enclosed in square brackets can come from the following set:

Filled Pause ([uh], [um], [er], [ah], [mm])
Speaker other ([laughter], [cough], [grunt], [throat_clear], [mumbling], [unintelligible])
Nonspeaker other ([phone], [paper_rustle], [door_slam])

Note that while the exact specification of the type of acoustic event is subjective, these events MUST be marked in the correct location in a transcribed utterance. It is often difficult to localize these events; transcribing the utterance first, and listening for these events in a second pass is the correct procedure.

Note that any term can be used inside the brackets, but there should be no spaces inside brackets; use an underscore to connect words.

Note that the the filled pauses represent acoustic events similar acoustically and phonetically to speech. If possible, try to limit these to the set on the list, so that those interested in these events can find them easily. If others occur, contact the MADCOW committee via your MADCOW representative.

For noise events that occur over a span of one or more words, the transcriber should:

indicate the beginning and ending of the noise, to the nearest word: "show the [paper_rustle/] flights to boston [/paper_rustle] or
indicate that the sound overlaps one word, e.g. a door slam during the word "flights" could be transcribed either:
"show the [door_slam>] flights to boston
or "show the flights [<door_slam] to boston

These guidelines are compatible with those used for the DOT files associated with the Wall Street Journal task. That task specifies the following set of non-speech markers, which for compatibility, transcribers of .sro files are encouraged to use:

      [chair_squeak]
      [cough]
      [cross_talk]
      [door_slam]
      [grunt]
      [laughter]
      [lip_smack] (use ONLY if EXCEPTIONALLY loud!)
      [loud_breath] (do NOT mark audible but low-level breath noises)
      [paper_rustle]
      [phone_ring]
      [sigh] (only if the amplitude is comparable to surrounding speech)
      [throat_clear]
      [tongue_click] (use ONLY if EXCEPTIONALLY loud!)
      [unintelligible]
      [sniff]
      [tap]
      [noise]

(The following speech sounds are also transcribed in the CSR and in the .sro transcriptions)

      [er]
      [mm]
      [uh]
      [um]

(The following speech style markers are used in CSR, and considered optional in the .sro transcriptions).

      [loud]
      [soft]
      [whisper]

Note: Acoustic events such as inhalation, exhalation, tongue clicks, lip smacks, and breath noise will not be transcribed if they are low level and non-intrusive.

2. Markings Helpful for Interpretation.

These markings should be used when salient; transcribers should not assume that they are optional. However, the transcriber should not agonize over these decisions. If in doubt, leave it out. The transcription is basically at the lexical level, and should be done relatively quickly. The following are intended to be helpful markings that should be used when the phenomena are very clear.

2.1 Intonational Boundaries

A comma will be used to indicate an intonational separation. It is preceded and followed by a space.

i'd like to fly on delta , first class , july second

An intonational separation may be achieved by:

changing the pitch range, as in parentheticals (typically the parenthetical material is said in a reduced pitch range, i.e, with less pitch variation).
by use of boundary tones, i.e., one of the following:
1. a dramatic fall in pitch (for example as when concluding a statement, or the answer to a question)
2. a dramatic rise in pitch (for example as when concluding a yes-no question)
3. a continuation rise (as used for example in a list of items, e.g., "x , y , and z"

Boundary tones at the ends of the transcribed complete utterance or before a significant silence, as indicated in 2.2 and when transcribed, are so often redundant that they need not be transcribed. The use of the comma is intended to disambiguate and to make more interpretable utterances that would otherwise be either difficult or ambiguous, e.g.,

to make more interpretable:
"show me the flights to boston . what are their fares"
(if there is a pause between the two sentences), or
"show me the flights to boston , what are their fares"
(if there is an intonational indication but no pause)
"show me the flights to boston: what are their fares"
(if there is neither a pause nor intonational marking, but there is lengthening)
"show me the flights to boston what are their fares"
(ONLY if there is no lengthening, pause or intonational indication of the separation between the two sentences).
to disambiguate:
"what are the restrictions for fare codes q , x , and y"
The above example means something different from:
"what are the restrictions for fare codes q x and y"

2.2 Silent Pauses

Silent pauses will be marked with a period (``.''). The use of the period indicates a significant silence, i.e., one that is clearly noticeable by listening, and which is significantly longer than a silence associated with a stop consonant closure for the rate of speech used by the speaker. Example:

show me the . flights to boston

Previous SRO conventions dictated that "." be used for a one-second pause, ". ." for a two second pause, etc. This is no longer in effect: a "." may be used to indicate a significant duration of silence, without giving further information on its duration. This was hard for transcribers to do, was inconsistently applied, and is more appropriately done by automatic methods. Thus in the above example, the silence could be 400 ms or one minute, for example.

2.3 Emphatic Stress

An exclamation mark (``!'') before a word or syllable indicates emphatic stress. This includes stress beyond what might normally occur based on lexical and syntactic factors. This is used sparingly and subjectively. Note that the "!" only precedes a word. Example:

show me only !delta flights

2.4 Lengthening

Lengthening, typically vowel lengthening, will be indicated by a colon (``:'') placed immediately after the lengthened sound. This is used sparingly and subjectively. Note that ":" always follows some sound; if it occurs within a word, it is not followed by a space. Examples:

show me the: flights to boston
which flights ha:ve economy fares

Lengthenings before silences are so often observed that hearing them is difficult and would make the transcribers job much more difficult than it is intended to be. They therefore need not be marked before the end of the utterance or before a transcribed silence.

3. Truncated Waveforms 3.1 Marking of transcription

If a .wav file is truncated due to a recording error by the system or by the failure of the subject to press/depress the push-to-talk button at the proper times, the following notation in the corresponding .sro file is to be used:

- Beginning of utterance truncation:
~ transcription
End of utterance truncation:
transcription ~
Beginning and end of utterance truncation:
~ transcription ~
* Null waveform AND the wizard did not respond with an error message to the subject:
~~
* If the wizard responded to a totally truncated utterance with an error message, and this "empty" interchange is retained in the .log file then the .sro transcription should consist of a blank new-line and NOT a "~~". The utterance will then not be annotated as a "trunc-utt". The purpose of this is to distinguish those cases where dialogue coherence has been maintained, from those cases where the system may have gotten out of sync with what has been recorded in the .wav file.
However, the transcribers are typically not looking at the .log files, and hence do not know what the wizard did. Sites that still produce truncate utterances are strongly encouraged to correct the data collection mechanism to avoid this problem. In the meantime, transcribers at these sites may have to consult the .log files for resolution of some instances.
For cases in which the user pushed the button and then said nothing, the corresponding .sro file should be a blank line, with no indication of truncation.

4. Speech style.

Speech style is considered a level of detail that need not be included in the SRO transcriptions. However, those sites who want to include it should use the conventions for these markers that are described in the documentation for the .dot files for the Wall Street Journal task. (See section 1.8).

5. Autocompletion

Autocompletion files, in conjunction with gnuemacs tools can greatly increase the transcriber's efficiency. SRI does this via a file that can be maintained and updated by the transcriber, and can be obtained by requesting this software from SRI, via your MADCOW representative.

File: wav-specs.doc, updated 11/03/92

MADCOW Speech Waveform (.wav) File Type Specifications

ATIS MADCOW speech waveform files have been formatted using the NIST SPHERE header structure. They are stored on cd-rom in compressed form, using a version of Toni Robinson's "shorten" algorithm for waveform data compression. Source code (in "C") for the SPHERE Library and Utilities is available via anonymous ftp from NIST (see below for instructions on downloading the software). Users without access to Internet ftp file transfers may contact the Linguistic Data Consortium to obtain the source code by mail (see instructions at the end of this file).

The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. The fixed portion is as follows:

NIST_1A<new-line>
1024<new-line>

The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identify the header as well as allow those who do not wish to read the subsequent header information to programmatically skip over it.

The remaining object-oriented variable portion is composed of object-type-value "triple" lines which have the following format:

<LINE> ::= <TRIPLE><new-line> |
           <COMMENT><new-line> | 
           <TRIPLE><COMMENT><new-line>

  <TRIPLE> ::= <OBJECT><space><TYPE><space><VALUE><OPT-SPACES>

    <OBJECT> ::= <PRIMARY-SUBOBJECT> | 
                 <PRIMARY-SUBOBJECT><SECONDARY-SUBOBJECT>

    <PRIMARY-SUBOBJECT> ::= <ALPHA> | <ALPHA><ALPHA-NUM-STRING>
    <SECONDARY-SUBOBJECT> ::= _<ALPHA-NUM-STRING> | 
                              _<ALPHA-NUM-STRING><SECONDARY-SUBOBJECT>

    <TYPE> ::= -<INTEGER-FLAG> | -<REAL-FLAG> | -<STRING-FLAG>

      <INTEGER-FLAG> ::= i
      <REAL-FLAG> ::= r
      <STRING-FLAG> ::= s<DIGIT-STRING>
      
    <VALUE> ::= <INTEGER> | <REAL> | <STRING>  (depending on object type)

      <INTEGER> ::= <SIGN><DIGIT-STRING>
      <REAL> ::= <SIGN><DIGIT-STRING>.<DIGIT-STRING> 

    <OPT-SPACES> ::= <SPACES> | NULL

  <COMMENT> ::= ;<STRING>  (excluding embedded new-lines)

<ALPHA-NUM-STRING> ::= <ALPHA-NUM> | <ALPHA-NUM><ALPHA-NUM-STRING>
<ALPHA-NUM> ::= <DIGIT> | <ALPHA>
<ALPHA> ::= a | ... | z | A | ... | Z
<DIGIT-STRING> ::= <DIGIT> | <DIGIT><DIGIT-STRING>
<DIGIT> ::= 0 | ... | 9
<SIGN> ::= + | - | NULL
<SPACES> ::= <space> | <SPACES><space>
<STRING> ::=  <CHARACTER> | <CHARACTER><STRING>
<CHARACTER> ::= char(0) | char(1) | ... | char(255)

Note: The grammar does not impose any limit on the number of objects.

The single object "end_head" marks the end of the active header and the remaining unused header space is undefined.

The MADCOW headers include the following fields:

Field                    Type     Description - Probable defaults marked in ()
-----------------------  -------  ---------------------------------------------
speaker_id               string   3-char. speaker ID from filename
speaking_mode            string   speaking mode ("spontaneous" or "read")
recording_date           string   beginning of recording date stamp of the
                                  form DD-MMM-YYYY.  Should contain the string
                                  "unknown" if this info is not available.
recording_time -s11      string   beginning of recording time stamp of the
                                  form HH:MM:SS.HH.  Should contain the string
                                  "unknown" if this info is not available.
microphone               string   microphone description ("Sennheiser HMD-410"
                                  or "Crown PCC-160")
utterance_id             string   utterance ID from filename of the form
                                  XXXUUSMP as described in the filenames 
                                  section above.
database_id              string   database (corpus) identifier ("atis2")
database_version         string   database (corpus) revision ("1.0")
channel_count            integer  number of channels in waveform ("1")
speaker_session_number   string   1-char. session ID from filename
sample_count             integer  number of samples in waveform
sample_max               integer  maximum sample value in waveform
sample_min               integer  minimum sample value in waveform
sample_rate              integer  waveform sampling rate ("16000")
sample_n_bytes           integer  number of bytes per sample ("2")
sample_byte_format       string   byte order (MSB/LSB -> "10" or 
                                  LSB/MSB -> "01")
sample_sig_bits          integer  number of significant bits in each sample
                                  ("16")
session_utterance_number integer  number of utterance within session (base 10)
                                  starting at "1"
speaker_sentence_number  string   number of utterance within session (base 36)
end_head                 none     end of header identifier

In addition to the fields listed above, there are two header entries pertaining to the use of the "shorten" compression algorithm:

sample_coding            string   "pcm,embedded-shorten-v1.09"
sample_checksum          integer  value provided by compression routine

Example ATIS header from SRI data:

NIST_1A
   1024
database_id -s5 atis2
database_version -s3 1.0
utterance_id -s8 r80062ss
channel_count -i 1
sample_count -i 74010
sample_rate -i 16000
sample_min -i -3570
sample_max -i 3856
sample_n_bytes -i 2
sample_byte_format -s2 10
sample_sig_bits -i 16
speaker_id -s3 r80
speaking_mode -s11 spontaneous
recording_date -s11 18-Nov-1991
recording_time -s11 14:01:26.00
microphone -s18 Sennheiser HMD-414
speaker_session_number -s1 2
session_utterance_number -i 6
speaker_sentence_number -s2 06
sample_coding -s26 pcm,embedded-shorten-v1.09
sample_checksum -i 11939
end_head

Instructions for obtaining and using SPHERE

NIST has developed the SPHERE Library and Utilities package to provide an easy-to-use programming interface and essential command-line operations for manipulating speech files. The ATIS-2 waveform data were prepared for publication using SPHERE version 2.0 "beta". The current release of SPHERE is available for free via anonymous FTP from NIST, as follows:

	Connect to host:	jaguar.ncsl.nist.gov
	Go to directory:	pub
	Set transfer mode:	binary
	Get file:		sphere_2.0_Beta2.tar.Z

(Note that the file shown represents the version that is current as of publication of ATIS-2; as subsequent releases are made available, the file name will change accordingly. In general, only one version of SPHERE is present on the ftp server, and that will be the most recent release.)

For those who do not have access to the Internet FTP service, the SPHERE package may be obtained for free from:

	Linguistic Data Consortium
	441 Williams Hall
	University of Pennsylvania
	Philadelphia, PA 19104

You may also send a request by e-mail to "ldc@unagi.cis.upenn.edu" or call the LDC at (215) 898-0464.

After obtaining and installing the SPHERE package, you should refer to the on-line manual pages included with the release for instructions on usage. The relevant utility program for decompressing waveform data is "w_decode".