ATIS0 Complete

Item Name: ATIS0 Complete
Author(s): Charles T. Hemphill, John J. Godfrey, George R. Doddington, John S. Garofolo, Jonathan G. Fiscus
LDC Catalog No.: LDC93S4A
ISBN: 1-58563-001-2
ISLRN: 101-041-175-695-3
Member Year(s): 1993
DCMI Type(s): Sound, Text
Sample Type: pcm
Sample Rate: 16000
Data Source(s): microphone speech
Project(s): ATIS
Application(s): speech recognition, spoken dialogue systems
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC93S4A Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Hemphill, Charles T., et al. ATIS0 Complete LDC93S4A. DVD. Philadelphia: Linguistic Data Consortium, 1993.

Introduction

The ATIS0 Corpus is comprised of spontaneous data from 36 speakers; read versions of the data from 20 of those speakers, along with some adaptation material; and extensive speaker dependent material from the ATIS domain, read by ten of the same speakers.

LDC also released: LDC93S4B - ATIS0 Pilot, LDC93S4B-2 - ATIS0 Read, and LDC93S4B-3 - ATIS0 SD-Read

Data

All ATIS speech data is recorded at 16kHz sample rate, 16-bit quantization, from two different microphones, a close-talking (Sennheiser HMD414) and a desk-top (Crown PCC-160) model.

ATIS0 Pilot contains spontaneous utterances elicited in a "Wizard-of-Oz" simulation, along with the relational database containing the travel information (excluding connecting flights). 36 speakers produced a total of 912 utterances.

ATIS0 Read contains "read" versions of the spontaneous utterances for 20 of the 36 speakers above, for a total of 478 productions. This is supplemented by a set of 40 "adaptation" sentences read by each of the 20 speakers.

ATIS0 SD-Read contains "read" speech in the ATIS domain for ten of the speakers on ATIS0 Pilot. They read a total of 3,171 utterances, or approximately 317 utterances per speaker. This data was collected for the purpose of training speaker-dependent speech recognition systems for the ATIS0 domain. This section also contains the close-talking (Sennheiser) microphone data and corresponding data for the desk-top (Crown PCC-160) microphone. Thus there are 6,342 waveform files in this section.

Samples

Please view this audio sample and transcript sample.

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee