Air Travel Information Service Phase III (ATIS3) Speech and Natural Language Understanding Corpora 1994 Development Test Material NIST Speech Disc 17-4.2 March, 1995 The Air Travel Information Service (ATIS) domain was selected as a common research domain to facilitate the development and common evaluation of speech understanding systems within the Advanced Research Projects Agency Spoken Language Technology Program. This disc contains a development test set drawn for the ATIS3 pool of multi-site data. This development test set is similar in scope and content to ATIS3 evaluation test sets. This disc serves as an addendum to the ATIS3 training and evaluation test data previously released on NIST speech discs 17-1, 17-2, and 17-3. The test data on this disc is comparable in size, scope, and annotation conventions to the December 1994 Evaluation Test data on disc 17-5.1. This data can be used in system development testing before conducting evaluation tests using the data on disc 17-5.1. A summary of the contents of this disc is as follows: 17_4_2.dir File containing directory of this disc. atis3/ Waveforms, transcriptions, annotations, and documentation for the 1994 ATIS3 Development Test set. comp/ NIST comparator for scoring CAS-formatted answers output from ATIS NL/SLS systems against reference answers. rdb4_0/ ATIS 46-city/52-airport relational database. score/ NIST speech recognition scoring software. Includes dynamic programming string-alignment scoring code and statistical significance tests. sphere/ NIST SPeech HEader REsources toolkit. Provides command- line and programmer interface to NIST-headered speech waveform files. Also provides for automatic decompression of the Shorten-compressed waveform files on these discs. General information files named "readme.doc" have been included in the high-level directories and throughout the documentation directory ("atis3/doc") on this disc, 17-4.2, and describe the contents of the directories. Note that the waveforms on this disc have been compressed using SPHERE-embedded Shorten. The following papers contain a more detailed description of the ATIS paradigm and corpora. PostScript copies of these papers have been included in the "atis3/doc" directory of this disc for your convenience. Hemphill, C.T., et al., "The ATIS Spoken Language Systems Pilot Corpus", Proc. DARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, June 1990. (tiatis90.ps) Hirschman, L., et al., "Multi-Site Data Collection for a Spoken Language Corpus", Proc. DARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, February 1992. (madcow92.ps) Hirschman, L., et al., "Multi-Site Data Collection and Evaluation in Spoken Language Understanding", Proc. ARPA Human Language Technology Workshop, Morgan Kaufmann Publishers, March 1993. (madcow93.ps) Dahl, D., et al., "Expanding the Scope of the ATIS Taslk: The ATIS-3 Corpus", Proc. ARPA Human Language Technology Workshop, Morgan Kaufmann Publishers, March 1994. (madcow94.ps) Note that the annotations on this disc conform to a new Principles of Interpretation (PofI) document which is a revision from previous ATIS3 PofI. This PofI pertains only to the 1994 Development Test Data (this disc) and Evaluation Test Data (NIST Speech Disc 17-5.1). The revised PofI is located under the "atis3/doc/pofi" directory. The collection of the ATIS3 corpus was sponsored by the Advanced Research Projects Agency Software and Intelligent Systems Technology Office (ARPA-SISTO). The corpus was annotated by SRI international and collated, documented and produced on CD-ROM by the National Institute of Standards and Technology under the sponsorship of the Linguistic Data Consortium.