NIST March 2000 Hub-5 Benchmark Test Results
for Recognition of Conversational Speech over the Telephone, in English and Mandarin.

Release 1.4 May 11 2000

Release history:



Introduction

This document provides a brief overview of the March 2000 Hub-5 benchmark test results and a map of the contents of this release.

This year's test participants included five research sites: ATT, BBN, CU-HTK, Miss. State Univ (MSSTATE) and SRI. As part of the submission instructions, participants differentiated their submissions via directory names. NIST has relabelled those names according to the following table.

Report ID's -> Submission Names
att1-bug -> primary
bbn1-bug -> byb_eng
bbn2 -> byb_man
cu-htk1 -> cu-htk1
cu-htk2 -> cu-htk2
cu-htk3 -> cu-htk3
cu-htk4 -> cu-htk4
msstate1 -> isip_asr
sri1 -> sri1
sri2 -> sri2

This document has separate sections for each language.

  • English Word Error
  • Mandarin Character Error

  • ENGLISH WORD ERROR RATE

    ,--------------------------------------------------------------------------------------.
    |                                                                                      |
    |                      Executive Scoring Summary by Percentages                        |
    |                Hub 5 Eval 1999 CallHome + Swb : Primary Systems Test                 |
    |                                                                                      |
    |      System       | # Snt   # Ref | Corr   Sub    Del    Ins   Err    S.Err |  NCE   |
    |-------------------+---------------+-----------------------------------------+--------|
    | att1-debugged.ctm | 4459    42993 | 73.4   18.3   8.2    2.5   29.0   66.9  | -5.741 |
    | bbn1-debugged.ctm | 4459    42993 | 74.3   17.8   8.0    3.4   29.1   69.1  | 0.094  |
    |    cu-htk1.ctm    | 4459    42993 | 77.1   15.2   7.7    2.5   25.4   64.9  | 0.271  |
    |   dragon98.ctm    | 4459    42993 | 73.5   19.2   7.3    3.6   30.0   67.5  |  ---   |
    |   msstate1.ctm    | 4459    42993 | 54.9   32.7   12.5   4.0   49.1   76.3  | -0.815 |
    |  nist-rover1.ctm  | 4459    42993 | 76.6   14.7   8.7    1.7   25.1   64.1  |  ---   |
    |     sri1.ctm      | 4459    42993 | 73.5   19.5   6.9    3.7   30.2   67.7  | 0.233  |
    `--------------------------------------------------------------------------------------'
    

    Comparative Summaries

    1. Word Error Rate Summary: eng_wer/Primary.summary, (as PS)
    2. Primary System Statistical Comparisons: scoring/eng_wer/Primary.stats, (as PS)

    3. Contrastive System Statistical Comparisons:
      ATT Statistical Comparisons , (as PS) WER Summary , (as PS)
      CU-HTK Statistical Comparisons , (as PS) WER Summary , (as PS)
      BBN Statistical Comparisons , (as PS) WER Summary , (as PS)
      NIST-ROVER Statistical Comparisons , (as PS) WER Summary , (as PS)
      SRI Statistical Comparisons , (as PS) WER Summary , (as PS)

    English System Results

    The following table contains links to each scoring report generated by SCLITE (for word error rate) for this benchmark test. The scoring/eng_wer subdirectory contains subdirectories for each test system which in turn contain the scoring reports for that system.

    WORD ERROR RATE
    System IDSystem DescriptionBy Speaker(#) SummaryBy Speaker(%) SummaryBy Corpus/
    By Gender
    Confidence DET CurveConfidence HistogramConfidence Binned Histogram
    ATT1-BUG att1-bug.desc att1-bug.raw att1-bug.sys att1-bug.lur , PS No conf. scores No conf. scores No conf. scores
    ATT1-DEBUGGED att1-debugged.desc att1-debugged.raw att1-debugged.sys att1-debugged.lur , PS No conf. scores No conf. scores No conf. scores
    ATT1-LATE att1-late.desc att1-late.raw att1-late.sys att1-late.lur , PS No conf. scores No conf. scores No conf. scores
    BBN1-BUG bbn1-bug.desc bbn1-bug.raw bbn1-bug.sys bbn1-bug.lur , PS bbn1-bug.det.ps bbn1-bug.hist.ps bbn1-bug.sbhist.ps
    BBN1-PASS1 bbn1-pass1.desc bbn1-pass1.raw bbn1-pass1.sys bbn1-pass1.lur , PS No conf. scores No conf. scores No conf. scores
    BBN1-DEBUGGED bbn1-debugged.desc bbn1-debugged.raw bbn1-debugged.sys bbn1-debugged.lur , PS bbn1-debugged.det.ps bbn1-debugged.hist.ps bbn1-debugged.sbhist.ps
    CU-HTK1 cu-htk1.desc cu-htk1.raw cu-htk1.sys cu-htk1.lur , PS cu-htk1.det.ps cu-htk1.hist.ps cu-htk1.sbhist.ps
    CU-HTK2 cu-htk2.desc cu-htk2.raw cu-htk2.sys cu-htk2.lur , PS cu-htk2.det.ps cu-htk2.hist.ps cu-htk2.sbhist.ps
    CU-HTK3 cu-htk3.desc cu-htk3.raw cu-htk3.sys cu-htk3.lur , PS cu-htk3.det.ps cu-htk3.hist.ps cu-htk3.sbhist.ps
    CU-HTK4 cu-htk4.desc cu-htk4.raw cu-htk4.sys cu-htk4.lur , PS cu-htk4.det.ps cu-htk4.hist.ps cu-htk4.sbhist.ps
    DRAGON98 dragon98.desc dragon98.raw dragon98.sys dragon98.lur , PS No conf. scores No conf. scores No conf. scores
    MSSTATE1 msstate1.desc msstate1.raw msstate1.sys msstate1.lur , PS msstate1.det.ps msstate1.hist.ps msstate1.sbhist.ps
    NIST-ROVER1 nist-rover1.desc nist-rover1.raw nist-rover1.sys nist-rover1.lur , PS No conf. scores No conf. scores No conf. scores
    NIST-ROVER2 nist-rover2.desc nist-rover2.raw nist-rover2.sys nist-rover2.lur , PS No conf. scores No conf. scores No conf. scores
    SRI1 sri1.desc sri1.raw sri1.sys sri1.lur , PS sri1.det.ps sri1.hist.ps sri1.sbhist.ps
    SRI2 sri2.desc sri2.raw sri2.sys sri2.lur , PS sri2.det.ps sri2.hist.ps sri2.sbhist.ps

    Table Key

    * You will need to install a helper application to view these PostScript files. Otherwise, you can print them on a PostScript printer. To view them using Netscape on a Sun Solaris system, define the Netscape helper application for "application/postscript" to be, "pageview -right %s", under the "Options/General Preferences/Helpers" menu.


    MANDARIN CHARACTER ERROR RATE

    ,------------------------------------------------------------------------------------.
    |                                                                                    |
    |                     Executive Scoring Summary by Percentages                       |
    |                      Hub 5 Eval 1999 CallHome Mandarin Test                        |
    |                                                                                    |
    |     System      | # Snt   # Ref | Corr   Sub    Del    Ins   Err    S.Err |  NCE   |
    |-----------------+---------------+-----------------------------------------+--------|
    |    bbn2.ctm     | 3029    29938 | 45.8   41.9   12.4   2.9   57.1   82.9  | -1.312 |
    `------------------------------------------------------------------------------------'
    

    Mandarin System Results

    The following table contains links to each scoring report generated by SCLITE (for character error rate) for this benchmark test. The scoring/man_cer subdirectory contains subdirectories for each test system which in turn contain the scoring reports for that system.

    WORD ERROR RATE
    System IDSystem DescriptionBy Speaker(#) SummaryBy Speaker(%) SummaryBy Corpus/
    By Gender
    Confidence DET CurveConfidence HistogramConfidence Binned Histogram
    BBN2 bbn2.desc bbn2.raw bbn2.sys bbn2.lur , PS bbn2.det.ps bbn2.hist.ps bbn2.sbhist.ps

    Reference Transcripts and Data Files