File class1.doc [extracted from class_a_crit2.txt, msg frm B. Moore dated 5/10/90] DEFINITION OF CLASS A QUERIES Class A queries will be identified by exception. Class A queries will be those that are none of the following: 1. context dependent There seem to be two broad subcategories here: a. queries containing explicit reference to a preceeding answer or question, such as "What classes of service are available on those flights?" b. queries whose scope is implicitly assumed to be limited by a preceeding answer or question, such as "Which flights go to Dulles?" in a context that limits attention to some particular set of flights to Washington DC. It is noted that some queries in the second subcategory could, in isolation, also receive a reasonable context-independent interpretation. For example, in context, "Please list an interpretation of the classes," is likely to mean the classes displayed in the preceeding answer, and thus is context dependent. It also could have a reasonable use referring to all classes. Such queries will be specially marked in the process of selecting class A queries. 2. vague, ambiguous, disambiguated only by context, or otherwise failing to yield a single cannonical database answer. Some of the particular cases noted so far include: a. attachment ambiguities. These will be excluded ONLY if it is not possible for an ordinary person to pick the preferred reading (without resort to context). b. "What does X mean?" These are out, unless X is an abreviation code that has a table that expands the code into a descriptive word, phrase, or set of attributes. The query is not acceptable, however, if X is a code that has more than one possible meaning according to what field it appears in, unless there is disambiguating context WITHIN the query, such as "What does fare code X mean?" c. "Give me information about X." These queries could be allowed, if someone will produce a table of allowable instances of X together with what information should be provided. Pending that happening, these queries are out. 3. grossly ill-formed. As long as the query is interpretable, only utterances that appear not to be attempts to speak normal conversational English will be excluded. For example, we should exclude attempts to speak some imagined form of "computerese" rather than normal English: "Origin Dallas, destination Boston, list flights." 4. other unanswerable queries Some subcases: a. queries not given a database answer by the wizard. This may include some queries that pass all our tests, but if the wizard did not generate a DB query, then we don't have anything to evaluate on. b. utterances that cannot be interpreted as queries, or that are incoherent. c. queries that request information not in the database. d. queries that refer to the way that information is presented. e. "meta queries" about system capabilites or structure or limits of the database. 5. queries from a noncooperative subject Utterances that are clearly designed to try to break the system should be excluded: "Given that city A is Oakland and city B is Fort Worth show me all flights from A to B." NOTES: Minor syntactic or semantic ill-formedness -- if the query is interpretable, it will be accepted, unless it is so ill-formed that it is clear that it is not intended to be normal conversational English. Presupposition failures -- all presuppositions about the number of answers (either existence or uniqueness) will be ignored. These are the only types of presupposition failures noted to date. Any other types of presupposition failure that make the query truly unanswerable will presumably result in the wizard being unable to generate a database query, and will be ruled out on those grounds. Multi-sentence utterances -- These will not automatically be ruled out. The examples cited so far are clearly interpretable as expressing multiple constraints that can be combined into a single query. PROCEDURE FOR CLASSIFYING SENTENCES: There are five general categories of non-class-A utterances, with an important special subcase of context-dependent utterances. We will therefore use the following code: C -- Hopelessly context dependent; COULD NOT reasonably be uttered with an unambiguous context-independent reading. C1 -- Context dependent, but COULD reasonably be uttered with an unambiguous context-independent reading. V -- vague, ambiguous, etc. I -- ill-formed (grossly). U -- unanswerable (for other reasons). N -- noncooperative subject. NOTE: There are some context-dependent queries that could be forced to have a context-independent interpretation, but it would be unreasonable to do so, because of the large amount of data that would be retrieved; for example, "Where are connections made?" Such queries will be classified as C rather than C1.