NIST March 2000 Hub-5 Benchmark Test Results
for Recognition
of Conversational Speech over the Telephone, in English and Mandarin.
Release 1.4 May 11 2000
Release history:
- 1.0, April 5, 2000:
- Initial release of English, (Mandarin isn't done yet).
- 1.1, April 6, 2000:
- Included the Mandarin results.
- A bug was found in BBN's 'bbn1' results. The 'bbn1' results were renamed 'bbn1-bug' and an
additional submission 'bbn1-pass1' has been concluded. BBN will be submitting their final
debugged system run as a later time.
- 1.2, April 11, 2000:
- BBN's final debugged system run was added.
- ATT submitted a debugged system run. The adaptation for 1 switchboard conversation was inappropriate.
- The Rover1 results were re-computed using the debugged systems from ATT and BBN, and a contrastive rover run was added.
- 1.3, May 8, 2000:
- BBN's DET plots were missing in the previous version, but were added.
- ATT's DET-related plots were deleted since the word confidence scores were all 1.
- 1.3, May 11, 2000:
- Added a late AT&T results.
- Added results submitted by Dragon using their 1998 evaluation system.
Introduction
This document provides a brief overview of the March 2000 Hub-5 benchmark
test results and a map of the contents of this release.
This year's test participants included five research sites: ATT, BBN,
CU-HTK, Miss. State Univ (MSSTATE) and SRI. As part of the submission instructions,
participants differentiated their submissions via directory names. NIST
has relabelled those names according to the following table.
Report ID's | -> | Submission Names
|
---|
att1-bug | -> | primary
|
bbn1-bug | -> | byb_eng
|
bbn2 | -> | byb_man
|
cu-htk1 | -> | cu-htk1
|
cu-htk2 | -> | cu-htk2
|
cu-htk3 | -> | cu-htk3
|
cu-htk4 | -> | cu-htk4
|
msstate1 | -> | isip_asr
|
sri1 | -> | sri1
|
sri2 | -> | sri2
|
This document has separate sections for each language.
English Word Error
Mandarin Character Error
ENGLISH WORD ERROR RATE
,--------------------------------------------------------------------------------------.
| |
| Executive Scoring Summary by Percentages |
| Hub 5 Eval 1999 CallHome + Swb : Primary Systems Test |
| |
| System | # Snt # Ref | Corr Sub Del Ins Err S.Err | NCE |
|-------------------+---------------+-----------------------------------------+--------|
| att1-debugged.ctm | 4459 42993 | 73.4 18.3 8.2 2.5 29.0 66.9 | -5.741 |
| bbn1-debugged.ctm | 4459 42993 | 74.3 17.8 8.0 3.4 29.1 69.1 | 0.094 |
| cu-htk1.ctm | 4459 42993 | 77.1 15.2 7.7 2.5 25.4 64.9 | 0.271 |
| dragon98.ctm | 4459 42993 | 73.5 19.2 7.3 3.6 30.0 67.5 | --- |
| msstate1.ctm | 4459 42993 | 54.9 32.7 12.5 4.0 49.1 76.3 | -0.815 |
| nist-rover1.ctm | 4459 42993 | 76.6 14.7 8.7 1.7 25.1 64.1 | --- |
| sri1.ctm | 4459 42993 | 73.5 19.5 6.9 3.7 30.2 67.7 | 0.233 |
`--------------------------------------------------------------------------------------'
Comparative Summaries
- Word Error Rate Summary: eng_wer/Primary.summary, (as PS)
- Primary System Statistical Comparisons:
scoring/eng_wer/Primary.stats, (as PS)
- Contrastive System Statistical Comparisons:
ATT |
Statistical Comparisons
, (as PS)
| WER Summary
, (as PS)
|
CU-HTK |
Statistical Comparisons
, (as PS)
| WER Summary
, (as PS)
|
BBN |
Statistical Comparisons
, (as PS)
| WER Summary
, (as PS)
|
NIST-ROVER |
Statistical Comparisons
, (as PS)
| WER Summary
, (as PS)
|
SRI |
Statistical Comparisons
, (as PS)
| WER Summary
, (as PS)
|
English System Results
The following table contains links to each scoring report generated by
SCLITE (for word error rate) for this benchmark test. The scoring/eng_wer subdirectory contains subdirectories
for each test system which in turn contain the scoring reports for
that system.
WORD ERROR RATE |
System ID | System Description | By Speaker(#) Summary | By Speaker(%) Summary | By Corpus/ By Gender | Confidence DET Curve | Confidence Histogram | Confidence Binned Histogram |
ATT1-BUG |
att1-bug.desc |
att1-bug.raw |
att1-bug.sys |
att1-bug.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
ATT1-DEBUGGED |
att1-debugged.desc |
att1-debugged.raw |
att1-debugged.sys |
att1-debugged.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
ATT1-LATE |
att1-late.desc |
att1-late.raw |
att1-late.sys |
att1-late.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
BBN1-BUG |
bbn1-bug.desc |
bbn1-bug.raw |
bbn1-bug.sys |
bbn1-bug.lur
, PS |
bbn1-bug.det.ps |
bbn1-bug.hist.ps |
bbn1-bug.sbhist.ps |
BBN1-PASS1 |
bbn1-pass1.desc |
bbn1-pass1.raw |
bbn1-pass1.sys |
bbn1-pass1.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
BBN1-DEBUGGED |
bbn1-debugged.desc |
bbn1-debugged.raw |
bbn1-debugged.sys |
bbn1-debugged.lur
, PS |
bbn1-debugged.det.ps |
bbn1-debugged.hist.ps |
bbn1-debugged.sbhist.ps |
CU-HTK1 |
cu-htk1.desc |
cu-htk1.raw |
cu-htk1.sys |
cu-htk1.lur
, PS |
cu-htk1.det.ps |
cu-htk1.hist.ps |
cu-htk1.sbhist.ps |
CU-HTK2 |
cu-htk2.desc |
cu-htk2.raw |
cu-htk2.sys |
cu-htk2.lur
, PS |
cu-htk2.det.ps |
cu-htk2.hist.ps |
cu-htk2.sbhist.ps |
CU-HTK3 |
cu-htk3.desc |
cu-htk3.raw |
cu-htk3.sys |
cu-htk3.lur
, PS |
cu-htk3.det.ps |
cu-htk3.hist.ps |
cu-htk3.sbhist.ps |
CU-HTK4 |
cu-htk4.desc |
cu-htk4.raw |
cu-htk4.sys |
cu-htk4.lur
, PS |
cu-htk4.det.ps |
cu-htk4.hist.ps |
cu-htk4.sbhist.ps |
DRAGON98 |
dragon98.desc |
dragon98.raw |
dragon98.sys |
dragon98.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
MSSTATE1 |
msstate1.desc |
msstate1.raw |
msstate1.sys |
msstate1.lur
, PS |
msstate1.det.ps |
msstate1.hist.ps |
msstate1.sbhist.ps |
NIST-ROVER1 |
nist-rover1.desc |
nist-rover1.raw |
nist-rover1.sys |
nist-rover1.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
NIST-ROVER2 |
nist-rover2.desc |
nist-rover2.raw |
nist-rover2.sys |
nist-rover2.lur
, PS |
No conf. scores |
No conf. scores |
No conf. scores |
SRI1 |
sri1.desc |
sri1.raw |
sri1.sys |
sri1.lur
, PS |
sri1.det.ps |
sri1.hist.ps |
sri1.sbhist.ps |
SRI2 |
sri2.desc |
sri2.raw |
sri2.sys |
sri2.lur
, PS |
sri2.det.ps |
sri2.hist.ps |
sri2.sbhist.ps |
Table Key
- System ID -
Directory containing all results for the named system.
- .desc -
- .raw -
Summary of speaker performance in terms of Number: Correct,
Substitutions, Deletions, Insertions, Word Errors and Sentence (or
Utterance) errors. Average, mean, median and standard
deviation is computed for each number.
- .sys -
Summary of speaker performance in terms of Percent: Correct,
Substitutions, Deletions, Insertions, Word Errors and Sentence (or
Utterance) errors. Averages, mean, median and standard
deviation is computed for each percentage.
- .lur -
Summary of system performance broken down by corpus and then by speaker.
- .det.ps * -
PostScript version of DET curve produced from the confidence scores.
- .hist.ps * -
PostScript version of histogram from the confidence scores. There are
three traces in each plot:
- histogram of all confidence scores
- histogram of the confidence score for each incorrect word
- histogram of the confidence score for each correct word
- .sbhist.ps * -
PostScript version of Binned Histogram from the confidence scores.
Theoretical and actual error rates are plotted for 10 bins with equal-width
confidence scores ranges.
* You will need to install a helper application to view these PostScript
files. Otherwise, you can print them on a PostScript printer.
To view them using Netscape on a Sun Solaris system, define
the Netscape helper application for "application/postscript" to
be, "pageview -right %s", under the "Options/General Preferences/Helpers"
menu.
MANDARIN CHARACTER ERROR RATE
,------------------------------------------------------------------------------------.
| |
| Executive Scoring Summary by Percentages |
| Hub 5 Eval 1999 CallHome Mandarin Test |
| |
| System | # Snt # Ref | Corr Sub Del Ins Err S.Err | NCE |
|-----------------+---------------+-----------------------------------------+--------|
| bbn2.ctm | 3029 29938 | 45.8 41.9 12.4 2.9 57.1 82.9 | -1.312 |
`------------------------------------------------------------------------------------'
Mandarin System Results
The following table contains links to each scoring report generated by
SCLITE (for character error rate) for this benchmark test. The scoring/man_cer subdirectory contains subdirectories
for each test system which in turn contain the scoring reports for
that system.
Reference Transcripts and Data Files
The reference directory contains the following
reference transcripts and other information necessary to score a benchmark test
submission.
- English Materials
- subdirectory english containing the
original switchboard and callhome transcript files (.txt).
- file hub5e00.english.000405.stm
containing the un-filtered reference STM transcript generated
by executing
make_reference. This script uses the callhome transcript filter
'chfilt.pl' available in the "Tranfilt" package. (See the Scoring Software section below.)
- file en20000405_hub5.glm containing
the global lexical mapping rules used by "csrfilt.sh" to map
variant spellings of words like 'uh-huh' and convert hesitation
words to the '%HESITATION' token.
- Mandarin Materials
- subdirectory mandarin containing the
original callhome transcript files (.txt).
- file hub5m00.mandarin.000405.stm
containing the un-filtered reference STM transcript generated
by executing
make_reference. This script uses the callhome transcript filter
'chfilt.pl' available in the "Tranfilt" package. (See the Scoring Software section below.)
- file ma970904.glm containing
the global lexical mapping rules.
Scoring Software
The scripts directory contains the perl script, hubscr04.pl , used by
NIST to score submissions for this benchmark test. It is provided as
a template for how one should use the NIST scoring tools when
attempting to duplicate the scoring methodology used by NIST to
produce the published scores. The script requires 2 NIST software
packages:
- SCLITE scoring software available via ftp in
compressed UNIX tar form in the file, sclite-1.4.tar.Z
- Transcription Filtering software available via ftp in compressed
UNIX tar form in the file, tranfilt-1.10.tar.Z
Example:
- Word Error Rate:
hubscr04.pl -g en20000405_hub5.glm -v -h hub5 -l english -r hub5e00.english.000405.stm att1.ctm att2.ctm