ARPA Continuous Speech Recognition North American Business News Corpus (CSRNAB1)
November 1994 Evaluation Test Hub and Spoke Data 02/08/95


* * * * * * * * * * * * * * * W A R N I N G * * * * * * * * * * * * * * * *
*                                                                         *
* If you intend to implement the protocols for the November '94 ARPA CSR  *
* Hub and Spoke Benchmark Tests, please read the files, "et94spec.doc"    *
* and "et94scor.doc", in the top-level directory of this disc in their    *
* entirety before proceeding and do not examine the included              *
* transcriptions, calibration recordings, adaptation recordings, or       *
* documentation unless such examination is specifically permitted in the  *
* guidelines for the test(s) being run.  Index files have been included   *
* under "csrnab1/doc/indices" which specify the exact data to be used for *
* each test.  To avoid testing on erroneous data, please refer to these   *
* files when running the tests.                                           *
*                                                                         *
* * * * * * * * * * * * * * * W A R N I N G * * * * * * * * * * * * * * * *
 

This CD-ROM contains the test data and documentation necessary to implement the November 1994 ARPA CSR Hub and Spoke Benchmark Tests. This disc also contains the software and documentation to score the results of those tests. The top-level directory of this disc contains the following files and subdirectories:

cdt8_1_1.dir Directory of this disc.

csrnab1/ November 1994 ARPA CSR Benchmark Test Material. Includes Shorten-compressed, SPHERE-formatted waveform data, adjudicated .dot and .lsn transcriptions, .ptx prompting texts, and data collection and subject documentation, and language model training texts for Spoke 2. Indices are also included which specify the exact data to be used in each test.

Note: The .lsn transcription files were generated from the .dot files using Doug Paul's "dot2lsn" PERL script, version 1.4. with NO flags. (The -nvp flag is no longer used since all lexeme mapping is now performed in the tranfilt pre-scoring filter.)

dot2lsn/ Doug Paul's PERL script (version 1.4) to convert .dot format transcription files to .lsn format transcription files.

et94scor.doc Instructions for preparing and scoring recognition system output.

et94spec.doc Hub and Spoke test specifications developed by the ARPA CCCC.

readme.doc This file.

score/ NIST scoring package (version 3.5.6) including phone-mediated alignment algorithm.

sphere/ NIST SPHERE speech file manipulation package (version 2.5)

tranfilt/ tranfilt lexeme mapping filter and map files for mapping multiple representations of certain lexemes/lexeme strings to canonical representations. This is to be used on the hypothesis and reference .lsns prior to scoring.