DARPA Air Travel Information System (ATIS0)

Read Versions of Spontaneous Data and Adaptation Data

NIST Disc CD5-2.1

This disc is the second in a series of CD-ROMs containing recordings of "natural speech", in the Air Travel Information System (ATIS) domain. The original spontaneous queries collected for these corpora are spoken, without scripts or other constraints, to a computerized simulation of a database system that includes a simplified version of the Official Airline Guide (OAG)(CR). A human "wizard" simulating the speech recognizer of the future gives the impression of a speech- recognizing computer system.

This CD-ROM is a complement to the first disc in the series, CD5-1.1, which contained the spontaneous utterances and relational database. This disc contains two types of speech data for twenty of the speakers on CD5-1.1: (1) read versions of spontaneous utterances in which the speakers read the transcriptions of their original spontaneous queries; and (2) 40 additional read "adaptation" training utterances from the same 20 speakers. The read versions of the spontaneous utterances will be useful for scientific studies of the differences between read and spontaneous speech. The additional adaptation utterances can be used in training a speech recognizer for particular voices, by those developing trainable speech recognizers. For the original spontaneous speech and more documentation, see the first disc in this series, CD5-1.1, NTIS order no. PB91-505354.

Speaker-Dependent Subset

A 10-speaker subset of the 20 speakers on this disc has been selected for speaker-dependent training and testing. They are identified by the speaker-ID codes "b0", "b2", "b5", "bd", "bf", "bg", "bl", "bn", "bp", and "bq". Additional read training material was recorded for these speakers and can be found on the discs, CD5-3.1 and CD5-4.1, NTIS order no. PB89-505370.

Directory Structure:

The directories and files under "/atis0" are structured as follows:

      doc/: Online documentation;  This directory contains the following
            files:

            adp_prmp.txt:  List of the 40 adaptation prompts.
            sd_spkrs.txt:  List of the 10 speaker-dependent speakers.
            spkrinfo.txt:  Table of speaker codes and their sex, age, and
                            dialect region.

      read_adp/: Directory containing the 40 read adaptation utterances from 
                 each of the 20 speakers.  See below for a description of the 
                 directory and file structures.  NOTE: only the 10 speakers 
                 specified above should be used for speaker-dependent systems.

      read_spn/: Directory containing the read versions of spontaneous 
                 utterances from each of the 20 speakers.  See below for a
                 description of the directory and file structures.  NOTE: 
                 only the 10 speakers specified above should be used for 
                 speaker-dependent systems.

     readme.doc: This file.

ATIS File Names and Types

The directories "read_adp" and "read_spn" under "ATIS0" are divided into speaker subdirectories. The speaker directories (identified by two-character speaker codes) are further divided into speaker-session subdirectories. Because the "read_adp" and "read_spn" data was collected in single sessions per speaker, each speaker directory contains only one speaker-session subdirectory (identified by the character, "1"). The speaker-session directories contain the speech waveform and prompt files which have the following format:

ATIS-FILE ::= <UTTERANCE-ID>.<TYPE>

    where,
    UTTERANCE-ID ::= <AA><BBB><C><D><E>"

        where,
        AA ::= "01" to "zz" (speaker identification code)
        BBB ::= "000" to "zzz" (sentence text code)
        C ::= "1" to "z" (session code)
        D ::= (speaking mode code)
              "s" (for spontaneous productions) |
              "r" (for read versions of spontaneous productions) |
              "c" (for common read productions)
        E ::= (microphone code)
              "s" (for Sennheiser) |
              "c" (for Crown) |
              "x" (does no apply)

    TYPE ::= (file type)
             "ptx" (prompting text) |
             "wav" (SPHERE-headered speech waveform) 
The waveform and prompt data for each utterance are, therefore, located in separate files with common utterance ID's. The prompt (.ptx) files are ASCII text files.