DARPA Air Travel Information System (ATIS0)

Speaker-Dependent Training Data

NIST Speech Discs 5-3.1, 5-4.1, 5-5.1, and 5-6.1 This set of discs is the third release in a series of CD-ROMs containing recordings of "natural speech", in the Air Travel Information System (ATIS) domain. The original spontaneous queries collected for these corpora are spoken, without scripts or other constraints, to a computerized simulation of a database system that includes a simplified version of the Official Airline Guide (OAG)(CR). A human "wizard" simulating the speech recognizer of the future gives the impression of a speech-recognizing computer system.

This set of CD-ROMs augments material in the first disc in the series, CD5-1.1, which contained the spontaneous utterances and relational database. These discs contain extensive read-speech training data for ten of the speakers on CD5-1.1 and can be used in training speaker-dependent speech recognition systems. For the original spontaneous speech and more documentation, see the first disc in this series, CD5-1.1, NTIS order no. PB91-505354.

Speaker-Dependent Set

The 10 speakers on these discs have been selected for speaker-dependent training and testing. They are identified by the speaker-ID codes "b0", "b2", "b5", "bd", "bf", "bg", "bl", "bn", "bp", and "bq". Additional read speech for these speakers can be found on the second disc in this series, CD5-2.1, NTIS order no. PB91-505362, which contains adaptation data and read versions of the spontaneous utterances on CD5-1.1.

Directory Structure:

The directories and files under "/atis0" are structured as follows:
doc/: Online documentation;  This directory contains the following
      files:

  trn_prmp.txt:  Collated list of the 2884
		 speaker-dependent training prompts.
  sd_spkrs.txt:  List of the 10 speaker-dependent speakers.
  spkrinfo.txt:  Table of ATIS0 speaker codes and their sex, 
                 age, and dialect region.

read_trn/: Directory containing the approximately 300 read training
           utterances from each of the 10 speaker-dependent speakers.  
           The speech for the 10 speakers is organized across the
           four discs as follows:
  CD5-3.1: Sennheiser microphone data from speakers "b0
           "b2", "b5", "bd", and "bf".
  CD5-4.1: Sennheiser microphone data from speakers "bg", 
           "bl", "bn", "bp", and "bq".
  CD5-3.1: Crown microphone data from speakers "b0", "b2", 
           "b5", "bd", and "bf".
  CD5-4.1: Crown microphone data from speakers "bg", "bl", 
           "bn", "bp", and "bq".
  
Note that the speech for speaker "b5" was recorded in two sessions and is split between two "session" subdirectories.

    readme.doc: This file.

ATIS File Names and Types

The directory "read_trn" under "ATIS0" is divided into speaker subdirectories. The speaker directories (identified by two-character speaker codes) are further divided into speaker-session subdirectories. The speaker-session directories contain the speech waveform and prompt files which have the following format:
ATIS-FILE ::= <UTTERANCE-ID>.<TYPE>

    where,
    UTTERANCE-ID ::= <AA><BBB><C><D><0E>"

        where,
        AA ::= "01" to "zz" (speaker identification code)
        BBB ::= "000" to "zzz" (sentence text code)
        C ::= "1" to "z" (session code)
        D ::= (speaking mode code)
              "s" (for spontaneous productions) |
              "r" (for read versions of spontaneous productions) |
              "c" (for common read productions)
        E ::= (microphone code)
              "s" (for Sennheiser) |
              "c" (for Crown) |
              "x" (does no apply)

    TYPE ::= (file type)
             "ptx" (prompting text) |
             "wav" (SPHERE-headered speech waveform) 

The waveform and prompt data for each utterance are, therefore, located in separate files with common utterance ID's. The prompt (.ptx) files are ASCII text files.

Please note that because these utterances were collected in a "marathon" recording session, some speaker fatigue occurred. In order to expedite the collection process, poorly spoken utterances were discarded and not re-recorded. Therefore, some "gaps" appear in this corpora.