File cas_spec.doc, Last Modified 7/31/91

This document specifies version 2 of the DARPA/ATIS Common Answer Specification, as revised 11/1/90. The changes are to allow CAS forms with "OR" connecting alternates and to allow certain strings to not be enclosed in quotation marks.

BASIC SYNTAX IN BNF:

       <answer>  ::=  <cas1> | (<cas1> <casn>+)
         <casn>  ::=  OR <cas1>
         <cas1>  ::=  <scalar-value> | <relation> | NO_ANSWER | no_answer
  <scalar-value> ::= <boolean-value> | <number-value> | <string>
 <boolean-value> ::=  YES | yes | TRUE | true | NO | no | FALSE | false
  <number-value> ::= <integer> | <real-number>
      <integer>  ::= <sign> <digit>+ | <digit>+ 
          <sign> ::= + | -
         <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
   <real-number> ::= <sign> <digit>+ . <digit>* | <digit>+ . <digit>*
        <string> ::= <char_except_whitespace>+ | "<char>*"
      <relation> ::= (<tuple>*)
         <tuple> ::= (<value>+)
         <value> ::= <scalar-value> | NIL

Standard BNF notation has been extended to include two other common devices : "<A>+" means "one or more A's" and "<A>*" means "zero or more A's".

The above formulation does not define <char_except_whitespace> and <char>. All of the standard ASCII characters count as members of <char>, and all but "white space" count as <char_except_whitespace>. Following ANSI "C", blanks, horizontal and vertical tabs, newlines, formfeeds, and comments are, collectively, "white space".

The only change in the syntax of CAS itself from the previous version is that now a string may be represented as either a sequence of characters not containing white space or as a sequence of any characters enclosed in quotation marks. Note that only non-exponential real numbers are allowed, and that empty tuples are not allowed (but empty relations are).

ADDITIONAL SYNTACTIC CONSTRAINTS

The syntactic classes <boolean-value>, <string>, and <number-value> define the types "boolean", "string", and "number", respectively. All the tuples in a relation must have the same number of values, and those values must be of the same respective types (boolean, string, or number).

If a token could represent either a string or a number, it will be taken to be a number; if it could represent either a string or a boolean, it will be taken to be a boolean. Interpretation as a string may be forced by enclosing a token in quotation marks.

In a tuple, NIL as the representation of missing data is allowed as a special case for any value, so a legal answer indicating the costs of ground transportation in Boston would be

   (("L"   5.00 ) ("R"    nil ) ("A"    nil ) ("R"    nil ))
ELEMENTARY RULES FOR CAS COMPARISONS String comparison is case-sensitive, but the distinguished values (YES, NO, TRUE, FALSE, NO_ANSWER, and NIL) may be written in either upper or lower case.

Each indexical position for a value in a tuple (say, the ith) is assumed to represent the same field or variable in all the tuples in a given relation.

Answer relations must be derived from the existing relations in the database, either by subsetting and combining relations or by operations like averaging, summation, etc.

In matching an hypothesized (HYP) CAS form with a reference (REF) one, the order of values in the tuples is not important; nor is the order of tuples in a relation, nor the order of alternatives in a CAS form using "OR". The scoring algorithm will use the re-ordering that maximizes the indicated score. Extra values in a tuple are not counted as errors, but distinct extra tuples in a relation are. A tuple is not distinct if its values for the fields specified by the REF cas are the same as another tuple in the relation; these duplicate tuples are ignored.

CAS forms that include alternate CAS's connected with "OR" are intended to allow a single HYP form to match any one of several REF CAS forms. If the HYP CAS form contains alternates, the score is undefined.

In comparing two real number values, a tolerance will be allowed; the default is plus or minus .01%. No tolerance is allowed in the comparison of integers. In comparing two strings, initial and final sub-strings of white space are ignored. In comparing boolean values, "TRUE" and "YES" are equivalent, as are "FALSE" and "NO".

CAS FILES

(NIST Standard ATIS Answer File Specification Ver. 1.0 6/5/90)

The answers to ATIS queries must be in an ASCII text file, each answer in CAS form, sequenced as in the associated "ndx" file. Any material following the first appearance of ";" on a line will be treated as comments. Blank lines will be ignored. The utterance i.d. should be on a comment line immediately preceding the answer CAS for the utterance.