ATIS3 Test Data

Item Name: ATIS3 Test Data
Author(s): Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christing Pao, Alexander Rudnicky, Elizabeth Shriberg, John S. Garofolo, Jonathan G. Fiscus, Denise Danielson, Enrico Bocchieri, Bruce Buntschuh, Beverly Schwartz, Sandra Peters, Robert Ingria, Robert Weide, Yuzong Chang, Eric Thayer, Lynette Hirschman, Joe Polifroni, Bruce Lund, Goh Kawai, Tom Kuhn, Lew Norton
LDC Catalog No.: LDC95S26
ISBN: 1-58563-043-8
ISLRN: 847-846-823-557-6
Member Year(s): 1995
DCMI Type(s): Sound
Sample Type: 1-channel pcm compressed
Sample Rate: 16000
Data Source(s): microphone speech
Project(s): ATIS
Application(s): spoken dialogue systems, speech recognition
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC95S26 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Dahl, Deborah A., et al. ATIS3 Test Data LDC95S26. Web Download. Philadelphia: Linguistic Data Consortium, 1995.

Introduction

This release contains a corpus of speech and natural language data collected under the auspices of the Advanced Research Projects Agency Spoken Language Systems (ARPA-SLS) technology development program. The corpus, which contains data in the Air Travel Information Services (ATIS) domain, was designed by the ARPA-SLS Multi-site Atis Data COllection Working (MADCOW) group and was collected by five sites at locations across the U.S.:

  • BBN Systems & Technologies, Cambridge, MA
  • Carnegie Mellon University, Pittsburgh, PA
  • MIT Laboratory for Computer Science, Boston, MA
  • National Institute of Standards and Technology, Gaithersburg, MD
  • SRI International, Menlo Park, CA

The corpora is part of the third phase of collection of ATIS data (ATIS3) and comprises the development test (NIST Speech Disc 17-4.2) and evaluation test material (NIST Speech Disc 17-5.1) used in the December 1994 ARPA SLS Benchmark Tests. As in the previous ATIS corpora, the speech contained in this corpus was elicited by presenting subjects with various hypothetical travel planning scenarios to solve. The resulting spontaneous spoken queries were recorded as the subjects interacted with partially or completely automated ATIS systems to solve the scenarios. Note that the ATIS3 training data is available on NIST Speech Discs 17-1.1 - 17-3.1.

Data

The recorded speech has been transcribed and annotated with categorizations and canonical reference answers. All of the utterances have been recorded using a close-talking, noise-canceling head-mounted Sennheiser microphone. For some subjects, secondary (noisier) microphone data was recorded simultaneously as well.

This release also contains the ATIS3 46 city/52 airport relational database, a revised Principles of Interpretation and test implementation and scoring instructions as well as other general documentation.

The ATIS3 corpus has been verified, collated, documented by the National Institute of Standards and Technology (NIST) in cooperation with MADCOW and distributed by the Linguistic Data Consortium (LDC).

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee