ATIS0 Complete

Item Name: ATIS0 Complete
Author(s): Charles T. Hemphill, John J. Godfrey, George R. Doddington, John S. Garofolo, Jonathan G. Fiscus
LDC Catalog No.: LDC93S4A
ISBN: 1-58563-001-2
ISLRN: 101-041-175-695-3
Member Year(s): 1993
DCMI Type(s): Sound, Text
Sample Type: pcm
Sample Rate: 16000
Data Source(s): microphone speech
Project(s): ATIS
Application(s): speech recognition, spoken dialogue systems
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC93S4A Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Hemphill, Charles T., et al. ATIS0 Complete LDC93S4A. DVD. Philadelphia: Linguistic Data Consortium, 1993.


The ATIS0 Corpus is comprised of spontaneous data from 36 speakers; read versions of the data from 20 of those speakers, along with some adaptation material; and extensive speaker dependent material from the ATIS domain, read by ten of the same speakers.

LDC also released: LDC93S4B - ATIS0 Pilot, LDC93S4B-2 - ATIS0 Read, and LDC93S4B-3 - ATIS0 SD-Read


All ATIS speech data is recorded at 16kHz sample rate, 16-bit quantization, from two different microphones, a close-talking (Sennheiser HMD414) and a desk-top (Crown PCC-160) model.

ATIS0 Pilot contains spontaneous utterances elicited in a "Wizard-of-Oz" simulation, along with the relational database containing the travel information (excluding connecting flights). 36 speakers produced a total of 912 utterances.

ATIS0 Read contains "read" versions of the spontaneous utterances for 20 of the 36 speakers above, for a total of 478 productions. This is supplemented by a set of 40 "adaptation" sentences read by each of the 20 speakers.

ATIS0 SD-Read contains "read" speech in the ATIS domain for ten of the speakers on ATIS0 Pilot. They read a total of 3,171 utterances, or approximately 317 utterances per speaker. This data was collected for the purpose of training speaker-dependent speech recognition systems for the ATIS0 domain. This section also contains the close-talking (Sennheiser) microphone data and corresponding data for the desk-top (Crown PCC-160) microphone. Thus there are 6,342 waveform files in this section.


Please view this audio sample and transcript sample.


None at this time.

Available Media

View Fees

Login for the applicable fee