ATIS0 Pilot
Item Name: | ATIS0 Pilot |
Author(s): | Charles T. Hemphill, John J. Godfrey, George R. Doddington, John S. Garofolo, Jonathan G. Fiscus, Nancy Dahlgren, William Fisher, Brett Tjaden, David Pallett |
LDC Catalog No.: | LDC93S4B |
ISBN: | 1-58563-002-0 |
ISLRN: | 477-521-980-972-9 |
DOI: | https://doi.org/10.35111/4t8c-r397 |
Member Year(s): | 1993 |
DCMI Type(s): | Sound |
Sample Type: | 1-channel pcm |
Sample Rate: | 16000 |
Data Source(s): | microphone speech |
Project(s): | ATIS |
Application(s): | spoken dialogue systems, speech recognition |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC93S4B Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Hemphill, Charles T., et al. ATIS0 Pilot LDC93S4B. Web Download. Philadelphia: Linguistic Data Consortium, 1993. |
Related Works: | View |
LDC93S4A - Complete ATIS0 corpus LDC93S4B - ATIS0 Pilot LDC93S4B-2 - ATIS0 Read LDC93S4B-3 - ATIS0 SD-Read
The ATIS0 Corpus is comprised of six parts: one with spontaneous data from 36 speakers; one with read versions of the data from 20 of those speakers, along with some adaptation material; and four with extensive speaker dependent material from the ATIS domain, read by ten of the same speakers.
All ATIS speech data is recorded at 16kHz sample rate, 16-bit quantization, from two different microphones, a close-talking (Sennheiser HMD414) and a desk-top (Crown PCC-160) model.
The first disc (ATIS0 Pilot) contains spontaneous utterances elicited in a "Wizard-of-Oz" simulation, along with the relational database containing the travel information (excluding connecting flights). 36 speakers produced a total of 912 utterances.
The second disc (ATIS0 Read) contains "read" versions of the spontaneous utterances for 20 of the 36 speakers above, for a total of 478 productions. This is supplemented by a set of 40 "adaptation" sentences read by each of the 20 speakers.
The third through the sixth discs (ATIS0 SD-Read) contain "read" speech in the ATIS domain for ten of the speakers on the first disc. They read a total of 3,171 utterances, or approximately 317 utterances per speaker. This data was collected for the purpose of training speaker-dependent speech recognition systems for the ATIS0 domain. Two of these four discs contain the close-talking (Sennheiser) microphone data and the other two contain corresponding data for the desk-top (Crown PCC-160) microphone. Thus there are 6,342 waveform files on the four discs.