File: log-specs.doc, updated 06/22/93.

Session logfile (.log) specification.

MADCOW
April 11, 1993

Document originally prepared by Alexander Rudnicky at CMU 412 268 2622
air@cs.cmu.edu

Updated 06/17/93 by JSG at NIST to:

  1. Changed syntax to identify different types of wizards in "Mode" field. If wizards are used, certain timestamps become optional. (Sections 3.1)
  2. Add header flag "[End_to_end:]" to identify end_to_end data collection mode. If [End_to_end:] mode is used, then the timestamp, "Answer Found" is required, otherwise it's optional. (Sections 3.1 and 5)
Updated 06/22/93 by JSG at NIST to:
  1. Added syntax to allow utterance-independent timestamps. (Sections 2.3, 3.6, and 5)
Updated 11/17/94 by WMF at NIST to:
  1. Allow the QUERY field to be optional rather than required, and clear up what "optional" means. (Section 3.3)
ATIS training and testing data are collected in the course of human interaction with real or simulated spoken language systems. Data collection is divided into ``scenario'' units which correspond to a given travel problem presented to a human subject. In addition to waveform (.wav) and transcription files (.sro), a logfile (.log) is collected. The logfile provides a detailed record of user inputs and system actions during the course of a scenario session and provides a source of data for system development as well as for end-to-end evaluation. This document describes the contents of a logfile.

1 File format

Each file will contain only the data from one speaker completing one scenario. If multiple scenarios are completed during a session, each scenario will have its own logfile. Blank lines in a logfile will be ignored, though all other contents must conform to the specifications presented in this document. The purpose of this is to simplify automatic processing of logfiles.

1.1 Numbering convention

Each utterance in the .log files will be identified using a 2-digit base-36 [0-9a-z] number, so as to correspond to the names of the associated files (.wav, .sro, etc). Although base-36 numbers begin at 00, all scenarios should begin with utterance 01. The logfile will have the (dummy) utterance number 00. Further information about file naming and other conventions not detailed here can be found in the relevant specification documents maintained by NIST.

2 Content format

2.1 Fixed length entries

A fixed length entries will be indicated by the key-word in square brackets, followed by the entry, on a single line, e.g.,

[Utterance:] 04

2.2 Variable length fields

For variable length fields, the beginning of the field will be flagged by [Begin <keyword>: XX] (where XX is the base-36 utterance ID) on one line, followed by zero or more lines of the entry, followed by [End <keyword>: XX] on a single line, that is:

[Begin Utterance: XX]
This is what the person said.
[End Utterance: XX]

2.3 Timestamps

Timestamps allow the recovery of information of about the time course of interaction, such as the duration of a scenario, average duration of a transaction, etc. The timestamp has one of the following four basic forms:


[Timestamp: <type> for utterance XX at HH:MM:SS]     \
[Timestamp: <type> for utterance XX at HH:MM:SS.MMM]  >- Utt.-dependent
[Timestamp: <type> for utterance XX at NA]           /

[Timestamp: <type> at HH:MM:SS(.MMM)|NA]             - Utt.-independent

The first form can be used by sites to identify an event, "<type>", which is associated with a particular utterance, "XX", at time, "HH:MM:SS". The second form can be used by sites that wish to maintain a finer-grained record of event times. The third form will be used in cases for which a time is unavailable, for whatever reason; it will typically be encountered in older data. The fourth form can be used by sites to provide timestamps for events which are not associated with particular utterances such as to indicate the completion of a scenario.

Specifications for "<type>" are provided in section 3.6. Some timestamps may require additional information to be placed at the end of the line, see section 3.6.1 for an instance.

2.4 Other information

Individual sites may wish to log additional information. Any such ``meta-information'' should be indicated by an asterisk (*) immediately following the opening square bracket, e.g., [* Listening at ...]. Such markers are meant to be ignored for purposes of analysis. See section 3.3 for additional conventions governing the inclusion of site-specific information.

3 Specification of logged information

A logfile generally consists of a preamble (header and scenario information) followed by a variable number of transaction units each corresponding to an input utterance.

3.1 Header information

The following information should be inserted only once per logfile, at the beginning of the file.

[Site:] <site> # a token to identify the recording site
[System:] <system> # a token to identify the site's system
[Mode:] XXX[YYY] # XXX = WIZ[rec|nl|rec/nl] or WIZ or AUT
[End_to_end:] # Flag to identify data collected/logged in end-to-end evaluation mode.
[MixedInitiative:] XXX # XXX = YES or NO
[Speaker:] XXX # three character speaker ID
[Date:] DDMMYY # day month year, 2 digits each
[RDBVersion:] rdbX.X # X.X is the version number

The following identifiers are permitted as arguments to the "Mode" field to identify the type of wizardry employed:
AUT - system is entirely automatic
WIZ[rec] - recognition wizard
WIZ[nl] - natural language wizard
WIZ[rec/nl] - recognition and natural language wizard
WIZ - default is recognition wizard only. Although this form is allowed for backward compatibility, the explicit form, WIZ[rec], is preferred.

If the system is being run in wizard mode, the information about the individual who is acting as the wizard may be included in the following recommended (but optional) clause: [Wizard:] <name> # if Mode=WIZ, name of person or other identifier 3.2 Scenario information The following information should be inserted only once per logfile, at the beginning of the file following the header. [ScenarioType:] <site><iii> [Begin ScenarioDescr:] <text> [End ScenarioDescr:] <site><iii> is a unique identifier for each scenario in the ATIS domain, where <site> is a site descriptor (CMU, SRI, MIT, etc.) and <iii> is a local identifier specifying the scenario. Scenarios are to be submitted to NIST, who will maintain a ftp-accessible archive of scenarios used for data collection and end-to-end evaluation. Note that NIST may also supply additional scenarios for purposes such as end-to-end evaluation. The nomenclature for such scenarios is left to NIST's discretion. <text> is an ascii version of the problem text actually presented to the subject as the current scenario. 3.3 Transaction information Each utterance will have the following keyed entries associated with it. Variable length entries are denoted by [Begin/End <keyword>: XX], while fixed-length entries are denoted by [<keyword>:], as described in section 2. Entries are marked as required (REQ) and optional (OPT); optional entries may either be missing altogether, along with their keywords, or just have null contents. The order of entries in a transaction should (within reason) correspond to chronological order (as implied in the following list). [UtteranceID:] XX -- REQ XX is the base-36 utterance id, beginning at 1 for ``real'' queries. This must be the first item in a transaction group. [Begin/End Utterance: XX] -- REQ Rapid transcription of the utterance or inserted .sro transcription. By definition, there is always one utterance per transaction, even if no speech was captured (~~). [Begin/End Sentence: XX] -- OPT (WIZ[nl] mode) Sentence used as input to the sub-system which accesses the database, typically a paraphrase of the subject's utterance. Note that the Sentence clause is optional (sites that work directly from the transcription do not produce such an intermediate form) [Begin/End Recognized: XX] -- REQ (AUT mode) Recognizer output (what the recognizer THOUGHT the person said). This clause is only required for wizardless ([Mode:] AUT) systems. [Begin/End Query: XX] -- OPT SQL or other query representation generated by the natural language sub-system and applied to the database to produce the answer shown in the display. [Begin/End Result: XX] -- REQ Display of database output shown to the subject. Should also include any other ascii text displayed to the subject. [Begin/End Synthesis: XX] -- OPT Text of any speech played to the subject (by means of text-to-speech synthesis or as a pre-recorded message). [Begin/End Error: XX] -- OPT Error message displayed to the subject. Sites do not have to produce a special error message form, although this has been the custom at certain sites. It is permissible to simply include error feedback in the Result clause. Rapid transcription of utterances may be a problem in wizardless mode. If no wizard is available to provide the rapid transcription, the [Begin/End Utterance: XX] fields should be inserted into the logfile as the data are collected but left blank. The collecting site should then insert the .sro files into the appropriate place in the logfile before submitting it to NIST. It is understood that the logfile entry retains the status of a ``rapid transcription'', and that the .sro remains the official repository of the (definitive) transcription. 3.4 Site-specific information Optional site-specific entries may be defined, observing the conventions used for standard entries: a fixed length field uses [<site> <keyword>:] and a variable length field uses [Begin/End <site> <keyword>: XX]. For example, [MIT ScenarioComplete:] might be used by MIT to indicate that the subject has completed a scenario. Note the existence of an alternate convention, described in section 2.4. The meaning of such entries should be defined in the site documentation submitted to NIST. 3.5 Special log file fields for end-to-end evaluation End-to-end evaluation requires insertion of a special clause to capture the answer supplied by the user. This may be typed in by the user, or it may be added to the logfile manually after data collection by the experimenter transcribing information provided by the subject. The answer clause follows the general format of a fixed-length entry, e.g.: [Answer:] DL 291, breakfast; DL 301, lunch 3.6 Time Stamps The following is the MINIMAL set of timestamp types: Start Speech The point in time at which the subject engages the speech input process (e.g., the key-down or button-down event in a push-and-hold input protocol). Recognition Done The point in time at which the recognition stage completes its processing. (OPT for [Mode:] WIZ[rec]) Display Done The point in time at which the screen display is complete. Answer Found The point in time at which the subject has decided that he or she has found a solution to the scenario problem and is ready to disclose it. (Required only for [End_to_end:] mode) Additional site-specific timestamps are permitted, though these should be identified as such in the following manner: [Timestamp: <site> <localtype> for utterance XX at HH:MM:SS] or [Timestamp: <site> <localtype> at HH:MM:SS] for utterance-independent events Any additional site-specific timestamps should be defined in the site documentation submitted to NIST. 3.6.1 Other user actions Individual ATIS systems may provide the user with the opportunity to interact with the system in other than speech mode (for example, scrolling the results display or switching the query context). In order to track the user's focus of attention during data collection, sites should record user actions by timestamps of the form: [Timestamp: User Action for utterance XX at HH:MM:SS] <site> <action> where <action> provides a description of the user's action (typically a non-speech interaction with the ATIS system). The timestamp should correspond to the initiation of an action. If recording the temporal extent of an action is deemed important, a separate timestamp should be recorded at the conclusion of the action. Any action that changes what the user sees on the screen should be recorded in a user action timestamp. The system documentation submitted to NIST should define the meaning of each action type logged. 4 Treatment of old logfiles Insofar as this is possible or desired by the consumers of logfile data, logfiles in older formats will be converted to the new specification, provided that this can be performed algorithmically. This will avoid maintaining two sets of software and data extraction procedures. 5 Sample logfile Below is an example of a logfile generated according to the specifications given in this document. Note that a non-existent speaker number (000) has been used for illustration. [Site:] CMU [System:] EtE2.0 [Mode:] AUT [End_to_end:] [MixedInitiative:] NO [Speaker:] 000 [Date:] 090493 [RDBVersion:] rdb4.0 [Timestamp: CMU Event for utterance 01 at 14:30:12.087] Automatic Context Clear [ScenarioType:] practice.scenario [Begin ScenarioDescr:] Practice Scenario You have won a free vacation to Hawaii. The package includes transportation on a charter plane that departs from San Francisco International Airport. You must arrange your own transportation from your home in Baltimore to San Francisco. Keep in mind that you must allow at least an hour for changing planes in San Francisco. The charter plane leaves San Francisco on a Friday night at 9 pm. Find a flight that will minimize your wait in San Francisco. (This is a practice scenario to get you acquainted with the system, so you don't have to find a return flight.) TEMPLATE: Airline(s) and Flight Number(s) from Baltimore to San Francisco: ANSWER: TW389/TW183 or DL911/DL755 or AA1533/AA1287 or UA332/UA367 [End ScenarioDescr:] [Timestamp: User Action for utterance 01 at 14:30:12.092] CMU Start Scenario [UtteranceID:] 01 [Timestamp: Start Speech for utterance 01 at 14:30:16.800] [Timestamp: CMU Speech input complete for utterance 01 at 14:30:20.246] [Timestamp: Recognition Done for utterance 01 at 14:30:23.217] [Begin Recognized: 01] SHOW ME FLIGHTS TO HONOLULU [End Recognized: 01] [Begin Utterance: 01] <sro> /Net/wiz2/users/data/atis3/spon/000/0/000010sx.sro [End Utterance: 01] [Timestamp: Display Done for utterance 01 at 14:30:24.936] [Begin Query: 01] [End Query: 01] [Begin Result: 01] UnknownCity: - To Airport : ??? ---------- (null pointer) [End Result: 01] [Begin Error: 01] Sorry, I don't know about HONOLULU. [End Error: 01] [Timestamp: User Action for utterance 02 at 14:30:30.738] CMU Clearing Query Context [UtteranceID:] 02 [Timestamp: Start Speech for utterance 02 at 14:30:40.564] [Timestamp: CMU Speech input complete for utterance 02 at 14:30:49.027] [Timestamp: Recognition Done for utterance 02 at 14:30:56.352] [Begin Recognized: 02] SHOW ME FLIGHTS FROM BALTIMORE TO SAN FRANCISCO THAT ARRIVE BETWEEN FIVE P M AND EIGHT P M [End Recognized: 02] [Begin Utterance: 02] <sro> /Net/wiz2/users/data/atis3/spon/000/0/000020sx.sro [End Utterance: 02] [Timestamp: Display Done for utterance 02 at 14:31:03.461] [Begin Query: 02] select distinct flight.departure_time, flight.arrival_time, flight.airline_flight, flight.stops, flight.from_airport, flight.to_airport from flight where ((flight.from_airport = 'BWI' and flight.to_airport in ('OAK', 'SFO'))) and (((flight.arrival_time between 1700 and 2000))) order by 1, 2 [End Query: 02] [Begin Result: 02] Displaying Flight Info: - From Airport : BALTIMORE - To Airport : SAN FRANCISCO - Arrive Time : 5:00 PM - 8:00 PM ---------- Leave Arrive Flight Stops From To 1145 1707 HP73/HP603 2 BWI SFO 1240 1735 DL911/DL755 1 BWI SFO 1300 1736 TW389/TW183 1 BWI SFO 1330 1738 UA332/UA367 1 BWI OAK 1330 1752 UA332/UA121 1 BWI SFO 1346 1740 AA1533/AA1287 1 BWI SFO [End Result: 02] [Timestamp: Answer Found at 14:31:30.199] [* generic] CMU Tallies r1:2 r2:0 [Answer:] tw389/tw183