Air Traffic Control Complete


Item Name: Air Traffic Control Complete
Authors: John J. Godfrey
LDC Catalog No.: LDC94S14A
NIST Catalog No.: 16-1.1 through 16-8.1
ISBN: 1-58563-024-1
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: 1-channel pcm
Data Source(s): field recordings
Application(s): speech recognition
Language(s): English
Language ID(s): eng
Distribution: 1 DVD
Member fee: $0 for 1994, 1997 members
Non-member Fee: US $1150.00
Reduced-License Fee: US $575.00
Extra-Copy Fee: US $200.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: John J. Godfrey
1994
Air Traffic Control Complete
Linguistic Data Consortium, Philadelphia

LDC94S14A - Complete ATC0 corpus LDC94S14B - ATC0 Logan International LDC94S14C - ATC0 Washington National LDC94S14D - ATC0 Dallas Fort Worth

Introduction

The Air Traffic Control Corpus (ATC0) is an eight-disc set of recorded speech for use in supporting research and development activities in the area of robust speech recognition in domains similar to air traffic control (several speakers, noisy channels, relatively small vocabulary, constrained languaged, etc.) The audio data on these discs is composed of voice communication traffic between various controllers and pilots.

Data

The audio files are 8 KHz, 16-bit linear sampled data, representing continuous monitoring, without squelch or silence elimination, of a single FAA frequency for one to two hours. There are also files which indicate the amplitude of the received AM carrier signal at 10 msec. intervals.

Full transcripts, including the start and end times of each transmission, are provided for each audio file. Each flight is identified by its flight number.

ATC0 consists of three subcorpora, one for each airport in which the transmissions were collected -- Dallas Fort Worth (DFW), Logan International (BOS) and Washington National (DCA). The complete set contains approximately 70 hours of controller and pilot transmissions collected via antennas and radio receivers which were located in the vicinity of the respective airports.

Detailed information regarding the collection process and the equipment used can be found on each disc in the file, "atc.doc" in the "doc" directory.

The ATC0 Corpus was collected by Texas Instruments under contract to DARPA. It was produced on CD-ROM by the National Institute of Standards and Technology for distribution by the Linguistic Data Consortium.

Samples

For an example of the data in this corpus, please examine the following files. The audio sample is in NIST Sphere format. Users should save this file rather than try to display it in the browser

Updates

Relative to the CD-ROMs produced in 1994 by NIST, the sphere files were renamed with the .sph extension, instead of the .wav extension.

Content Copyright