ARPA SLS Multi-site ATIS3 Data 1994 Development Test Material * * * * * * * * * * * * * * * W A R N I N G * * * * * * * * * * * * * * * * * * * If you intend to implement the protocols used in ARPA ATIS3 * * Benchmark Tests, please read this document in its entirety before * * proceeding and do not examine the included transcriptions, annotations, * * session logs, or documentation unless such examination is specifically * * permitted in the guidelines for the test(s) being run. Index files * * have been included which specify the exact data to be used for each * * test. To avoid testing on erroneous data, please refer to these files * * when running the tests. * * * * * * * * * * * * * * * * * * W A R N I N G * * * * * * * * * * * * * * * * Contents -------- 1.0 Overview 2.0 Subdirectories 3.0 Online Documentation 4.0 Test Set Indices 5.0 Test Scoring 5.1 Scoring ATIS SPREC Tests 5.1.1 Preparation of Hypothesized Transcripts 5.1.2 Scoring SPREC Results 5.2 Scoring ATIS NL and SLS Tests 5.2.1 Preparation of CAS NL/SLS Output and Hypothesis Answers 5.2.2 Scoring NL/SLS Results 1.0 Overview ------------- This directory contains the data and documentation necessary to implement ATIS3 development tests. This data may be used to conduct the following benchmark tests: Speech Recognition Sennheiser mic. waveforms (SPREC-S), Speech Recognition Crown mic. waveforms (SPREC-C). Natural Language (NL), Spoken Language System Sennheiser mic. waveforms (SLS-S), Spoken Language System Crown mic. waveforms (SLS-C), The test data consists of 975 utterances from 132 subject-scenarios spoken by 25 subjects. The data was collected at 5 sites (BBN, CMU, MIT, SRI, NIST) and each site is evenly represented in terms of number of utterances (approximately 200 utterances from each site). The data is organized using conventional MADCOW ATIS directory and file structures. The following filetypes are included on the disc: .log - session log .sro - detailed transcription .lsn - lexical SNOR transcription .wav - utterance waveform (*s.wav - Sennheiser mic, *c.wav - Crown mic.) (the .wav files are located on Disc 17-3.1) .squ - subject questionnaire file .com - comment file Only the files appropriate for the test to be run should be used if you intend to replicate the conditions in an ARPA ATIS3 test. THE .log, .sro AND .squ FILES HAVE BEEN INCLUDED ON THE DISC FOR POST-TEST DIAGNOSTICS ONLY AND SHOULD NOT BE CONSULTED UNTIL ALL OF THE TESTS ARE COMPLETED. See Sections 4.0 and 5.0 for specifics on implementing the tests. 2.0 Subdirectories ------------------- The "atis3/sp_tst/dev94" directory contains the following subdirectories: initial/ - directory containing initial (pre-adjudicated) test transcriptions and waveform files. Only the files in this directory are to be used in running a test. prelim/ - directory containing (pre-adjudicated) categorization and reference answer files, transcriptions, and log files used in scoring tests and for post-test diagnostics. rev1/ - directory containing revised categorization and reference answer files, transcriptions, and log files used in scoring tests and for post-test diagnostics. Several MADCOW sites ran the test data in "initial" during Summer '94 and scored it using the annotations in "prelim". The annotators re-examined annotations where no sites got an answer correct. This directory contains the revised annotations based on the re-examination. Since this data has not been adjudicated, there is no "final" directory as can be found on other discs containing evaluation test material. 3.0 Online Documentation ------------------------- The following files are included in the "atis3/sp_tst/nov93" directory: crown.ndx - index file containing list of ATIS3 crown microphone recordings for use in implementing SPREC and SLS tests crt_dirs.sh - UNIX shell script to create ATIS directory structure from an index file dates.txt - list of scenario-sessions and the date they were recorded. This information is to be used during the test to establish the system date. nl.ndx - index file containing list of ATIS3 .lsn transcriptions for use in implementing NL tests pre_clas.sum - Pre-adjudication December 1993 ATIS test query classification summary senn.ndx - index file containing list of ATIS3 Sennheiser microphone recordings for use in implementing SPREC and SLS tests See the directory "atis3/doc" for speaker information general ATIS3 documentation. The file formats of the test data are contained in this directory. 4.0 Test Set Indices --------------------- If you intend to run a NL test, you should use only the .lsn transcription files under the "initial" directory as input to your system. To insure using the correct files, refer to the list of files in the index file, "nl.ndx". Note that this list excludes 11 "empty" utterances. To implement the SPREC Sennheiser or SLS Sennheiser tests, refer to the index file, "senn.ndx", which contains the path/file spec for the Sennheiser microphone waveform files. To implement the SPREC Crown or SLS Crown tests, refer to the index file, "crown.ndx", which contains the path/file spec for the Crown microphone waveform files. Note that only a subset (390 utterances) of the data in the test set was collected using Crown microphones as well as Sennheiser microphones.. Note that the Sennheiser microphone index contains 11 less waveform files than there are .lsn files due to empty utterances. 5.0 Test Scoring ----------------- This section describes the process used by NIST in scoring the December 1993 ATIS Natural Language (NL), Spoken Language Systems (SLS) and Speech Recognition (SPREC) tests. The information in this section can also be used by those who wish to duplicate the scoring methodology used. Sections 5.1.1 and 5.1.2 provide instructions on running a SPREC test. The SPREC test is scored using the NIST speech recognition scoring package supplied on this disc in the "/score" directory. Please install the scoring package by following the instructions in the file '/score/readme.doc'. For a complete description of the NIST scoring package and its use, see the file, "/score/doc/score.rdm", on this disc. Sections 5.2.1 and 5.2.2 provide instructions on running NL and SLS tests. The NL and SLS tests are scored using the NIST CAS answer comparator (comp4.exe) supplied in the "/comp" directory on this disc. Please install the comparator according to the instructions in the documentation file, "/comp/readme.doc". 5.1 Scoring ATIS SPREC Tests ----------------------------- 5.1.1 Preparation of Hypothesized Transcripts ---------------------------------------------- In order for the NIST scoring software to properly score the output produced by SPeech ReCognition (SPREC) systems, the system-generated hypothesized transcripts must be formatted according to the Lexical Standard Normal Orthographic Representation, (LSN) format. To produce reference LSN transcriptions in ATIS, the lexical SNOR format is derived by filtering the detailed Speech Recognizer Output (SRO) transcription format to maintain only the lexical information required in scoring simple speech recognition output. The LSN format can be understood by looking at the SRO specifications in "/atis3/doc/sro_spec.doc" and performing the following simplifications (from the "sro2lsn" program in "/score/bin"): 1) remove edit cues and leave the remaining words 2) add spaces before and after alternation markers 3) delete the helpful interpretation marks 4) delete non-lexical acoustic events in square brackets 5) remove angle brackets from verbally deleted words 6) remove the stars from mispronounced words 7) delete false start words ending (or beginning) with a hyphen 8) replace any empty alternations with @ 9) collapse runs of spaces, delete initial and final spaces 10) convert everything to uppercase The recognized transcriptions must be put into the LSN format to be scored properly against the reference transcriptions. Prior to scoring, the SPREC transcriptions must be concatenated into a single file with one utterance-transcription per line and the utterance ID in parens at the end of the line. For example: SHOW ME THE FLIGHTS FROM BOSTON TO DENVER (ZZZ011SS) WHAT IS THE FARE FOR FLIGHT ONE TWO THREE (ZZZ021SS) WHAT MEALS ARE SERVED ON THAT FLIGHT (ZZZ031SS) . . . 5.1.2 Scoring SPREC Results ---------------------------- In order to simplify scoring SPREC output, a UNIX shell script has been created which performs all of the necessary scoring package housekeeping tasks. The script, "wgscore", is located in the directory, "/score/bin", on this disc. The script will take as input the concatenated SPREC hypothesis transcription file and can be run with various options. When implementing wgscore, a directory based on the concatenated SPREC hypothesis transcription filename is created which contains hyp-ref alignments and various summaries. See the manual page for "wgscore" in "/score/doc/man/man1/wgscore.1" for instructions on its use. 5.2 Scoring ATIS NL and SLS Tests ---------------------------------- 5.2.1 Preparation of CAS NL/SLS Output and Hypothesis Answers -------------------------------------------------------------- To run a Natural Language (NL) or full Spoken Language System (SLS) test, first create an index file containing the full path and file specs of the files to be processed (.lsn files for NL, .wav files for SLS). Index files have been created for the tests ("nl.ndx" [for NL] and "senn.ndx" and "crown.ndx" [for SLS] in the this directory). Next, create an output directory on magnetic disk for your system output (HYP_DIR) and duplicate the / paths in the index file under this directory. A UNIX shell script, "crt_dirs.sh", has been provided in this directory to aid this step. The syntax for the script is: crt_dirs.sh Next, process the files specified in the index through your NL or SLS system and create a file for each answer under the appropriate // directory. Be sure to use the system dates for each scenario as specified in the file, "dates.txt", in this directory. For your output files, use the same basenames as the input files, but assign a unique extension to the answer files you generate such as ".nl", ".sls-senn", ".sls-crown", etc. In order to be scored using the NIST comparator, your output must be formatted according to the ARPA Common Answer Specification (CAS). The CAS document is located in the file, "cas_cpec.doc", in the "atis3/doc" directory on this disc. An example output directory from a nonexistent NL system has been created under "example/" which contains sample ".nl" system output files in the proper directory and file structure for scoring. 5.2.2 Scoring NL/SLS Results ----------------------------- You can score your results in one step using the UNIX shell script, "scor_cas.sh", located in the "/comp" directory on this disc. Note that the NIST comparator, "comp4.exe", which is also located in the "/comp" directory must first be installed before running "scor_cas.sh". See the file, "readme.doc", under "/comp" for installation instructions. The syntax for "scor_cas.sh" is: scor_cas.sh where, REF_DIR is the path for the directory containing the //CAS-reference-answers hierarchy. HYP_DIR is the path for the directory containing the //CAS-hypothesis-answers hierarchy (HYP_DIR). HYP_EXT is the name of the extension you have given to your CAS hypothsis answer files (.nl, .sls-senn, etc.). Warning: Make sure that this extension is unique, since all files with this extension will be scored. COMP_DIR is the path for the NIST comparator, "comp4.exe". Example: scor_cas //atis3/sp_tst/dev94/rev1 //hyp_dir \ nl //comp/comp4.exe (above should be all on 1 line) where, CDROM is the path for your CD-ROM drive where this disc is located YOUR_DISK is the path for your local magnetic disk drive where your output and the executables for comp4.exe is stored (your output and comp4.exe can be stored on different disk). Upon completion, the script will generate two files in the current directory: class-a..score - scores and summary for Class-A queries class-d..score - scores and summary for Class-D queries scor_cas.sh performs several steps and creates several intermediate files in the current directory and employs the NIST comparator to actually score the results. If you would like to experiment with the comparator directly, see the file, "/comp/readme.doc" for a detailed description of the comparator and its use. An example execution is included below. The options used, "-ncfs", and "-pd3" are required to duplicate the settings used in the December 1993 scoring. comp4.exe -ncfs -pd3