ATIS - Seven Languages

Item Name: ATIS - Seven Languages
Author(s): Saab Mansour, Batool Haider
LDC Catalog No.: LDC2021T04
ISBN: 1-58563-954-0
ISLRN: 713-838-074-718-6
DOI: https://doi.org/10.35111/g9h5-0p74
Release Date: January 15, 2021
Member Year(s): 2021
DCMI Type(s): Text
Data Source(s): microphone speech
Project(s): ATIS
Application(s): discourse analysis, machine translation, speech recognition, spoken dialogue systems
Language(s): English, Spanish, German, French, Portuguese, Japanese, Chinese
Language ID(s): eng, spa, deu, fra, por, jpn, zho
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2021T04 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Mansour, Saab, and Batool Haider. ATIS - Seven Languages LDC2021T04. Web Download. Philadelphia: Linguistic Data Consortium, 2021.
Related Works: View

Introduction

ATIS - Seven Languages was developed by Amazon Web Services, Inc. and consists of 5,871 English utterances from ATIS (Air Travel Information Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26), translated into six languages: Spanish, German, French, Portuguese, Chinese, and Japanese.

The ATIS collection was developed to support the research and development of speech understanding systems. Participants were presented with various hypothetical travel planning scenarios and asked to solve them by interacting with partially or completely automated ATIS systems. The resulting utterances were recorded and transcribed. Data was collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon University, MIT Laboratory of Computer Science, National Institute for Standards and Technology, and SRI International.

Data

The data is separated into 4,978 utterances for training and 893 utterances for testing following the original ATIS division. The training set contains 4,978 utterances selected from the Class A (context independent) training data in the ATIS2 and ATIS3 corpora. The test set contains 893 utterances from the November 1993 and December 1994 data sets in ATIS3.

The original English utterances were manually translated into the six languages. This release also includes the original English utterance. Each utterance is annotated with named entities via table lookup; markers include city, airline, airport names and dates.

Data is stored in UTF-8 encoded tab separated value files.

Samples

Please view the following samples:

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee