2002 Rich Transcription Broadcast News and Conversational Telephone Speech

Introduction

This corpus contains the test material used in the 2002 Rich Transcription (RT-02) Evaluation of Broadcast News and Conversational Telephone Speech, administered by the NIST Speech Group in the Spring of 2002. The RT-02 Meeting Recognition Evaluation material is available in a separate distribution.

For complete up-to-date information, see the RT-02 Evaluation Website.

Instructions

To replicate the RT-02 evaluation tests, see the RT-02 Evaluation Plan as it contains the rules and conditions for implementing the RT-02 evaluation tests. Self-scoring instructions are given in the Self-Scoring section below.

Evaluation Tasks

The RT-02 Evaluation supported two main evaluation tasks:

  1. Speech-To-Text (STT) Tasks -- included three processing speeds (1x real time, 10x real time, and unlimited time) for both the Broadcast News (BN) and Conversational Telephone Speech (CTS) domains.

  2. Metadata Extraction (MDE) Task -- consisted of a speaker diarization task for the BN and CTS domains.

    The speaker diarization task as defined for the RT-02 Evaluation was significantly different than the diarization tasks evaluated in RT-03 and RT-04. The original RT-02 speaker diarization reference data has been updated to reflect the current practices so that comparisons can be made with the more recent evaluations. Note that the speaker diarization task in the CTS domain is no longer supported because it was deemed to be degenerate. Thus, only the speaker diarization reference in the BN domain is included in this distribution. For additional information on the speaker diarization task as defined in recent evaluations, please refer to the RT-04 Fall Evaluation Plan which can be found at the RT-04 Fall Evaluation Website.

Data Description

This distribution of the RT-02 Evaluation Data contains only Broadcast News and Conversational Telephone Speech data. Meeting data used in the RT-02 Evaluation is not included in this distribution and is packaged in a separate distribution. All recordings are in English.

Broadcast News (BN) Data

The BN data is composed of six, approximately 10-minute excerpts from six different broadcasts. The broadcasts were selected from sources collected in December 14-19, 1998. The evaluation excerpts were transcribed to the nearest story boundary. The table below lists the audio files making up the BN portion of this distribution.

Broadcast News Audio
bn02en_1.sph
bn02en_3.sph
bn02en_5.sph
bn02en_2.sph
bn02en_4.sph
bn02en_6.sph

For the RT-02 STT BN evaluation task, no manual (human-annotated) segmentations were provided. Sites were required to generate their own segmentations automatically or use the automatically generated segmentations provided in the PEM files located in the 'indices' directory. The provided segmentations were generated using the CMU automatic segmentation and classification utility CMUseg Version 0.5. The CMUseg utility has been graciously supplied to the DARPA community by Carnegie Mellon University for use as a common acoustic segmentation utility. Participants were not required to use this segmentation or the CMUseg utility. Sites were free to use any segmentation scheme of their choice. These segmentations were provided for the convenience of sites that did not have access to segmentation algorithms. Note: PEM files were not intended for use in the Metadata Extraction (MDE) tasks since the task itself required extension of speaker segmentation algorithms.

The audio files of the complete broadcasts are located in the 'audio' directory. Each waveform is a SPHERE-headered, single-channel, 16-bit PCM file. Although the entire audio file is provided and available for within broadcast adaptation, only the excerpts listed in the system input UEM files located in the 'indices' directory are included in the evaluation.

The Conversational Telephone Speech (CTS) Data

The CTS data is composed of 60, approximately 5-minute excerpts from 60 different conversations: 20 from Switchboard I data, 20 from Switchboard Phase II data, and 20 from Switchboard Cellular Phase II data. Evaluation excerpts were transcribed to the nearest turn. The table below lists the audio files and their sources making up the CTS portion of this distribution.

Conversational Telephone Speech Audio
Switchboard I Switchboard Phase II Switchboard Cellular Phase II
sw4386.sph
sw_30016.sph
sw_45063.sph
sw4394.sph
sw_30223.sph
sw_45147.sph
sw4398.sph
sw_30352.sph
sw_45187.sph
sw4409.sph
sw_30410.sph
sw_45255.sph
sw4477.sph
sw_30751.sph
sw_45284.sph
sw4490.sph
sw_30801.sph
sw_45458.sph
sw4528.sph
sw_30849.sph
sw_45501.sph
sw4535.sph
sw_30861.sph
sw_45734.sph
sw4536.sph
sw_30969.sph
sw_45939.sph
sw4578.sph
sw_30986.sph
sw_46098.sph
sw4627.sph
sw_31032.sph
sw_46312.sph
sw4634.sph
sw_31131.sph
sw_46387.sph
sw4639.sph
sw_31195.sph
sw_46516.sph
sw4653.sph
sw_31388.sph
sw_46667.sph
sw4705.sph
sw_31483.sph
sw_46715.sph
sw4730.sph
sw_31493.sph
sw_47205.sph
sw4755.sph
sw_31572.sph
sw_47435.sph
sw4806.sph
sw_31585.sph
sw_47566.sph
sw4851.sph
sw_32163.sph
sw_47610.sph
sw4866.sph
sw_39999.sph
sw_47620.sph

For the RT-02 STT CTS, systems were given manual segment times to decode. These segments are identified in the PEM files in the 'indices' directory.

Unlike the BN audio files where the full broadcasts were provided, the CTS audio files contain only the evaluation excerpts. Each audio excerpt is a SPHERE-headered, two channel interleaved 8-bit mulaw file. Echo cancellation was performed on the CTS data using the echo cancellation software developed by Mississippi State.

Self-Scoring

To self-score your system STT output, you will need to download three software packages from NIST: Speech Recognition Scoring Toolkit, Transcription Filtering Package, and HUB-Score Script by following the steps given below:

  1. Go to http://www.nist.gov/speech/tools
  2. Download and install the Speech Recognition Scoring Toolkit
  3. Download and install the Transcription Filtering Package
  4. Download the HUB-Score Script
  5. Run hubscr script with the appropriate arguments, i.e.,
    hubscr05.pl -g -h hub4 -l english -r ref_filename.stm <hyp_filename1.ctm> <hyp_filename2.ctm> ...

To self-score your system MDE speaker diarization output, you will need to download the MDE scoring script by following the steps given below:

  1. Go http://www.nist.gov/speech/tests/rt/rt2004/fall
  2. Download the MDE scoring script
  3. Run md-eval script with the appropriate arguments, i.e.,
    md-eval-v16.pl -1 -c 0.25 -u medata_scoring_uem_filename.scr.uem -r ref_filename.rttm -s hyp_filename.rttm

Experiment ID (EXP-ID)

Concatenated reference, system input UEM, Metadata scoring UEM, and PEM files are named using an Experiment ID (EXP-ID) code, where

EXP-ID = <SITE_ID>_<YEAR>_<TASK>_<DATA>_<LANG>_<TYPE>_<COND>_<SYS_ID>_<RUN_ID>

SITE_ID id of the participating site (reference data use 'expt')
YEAR 02
TASK stt1x | stt10x | sttul | spkr
DATA eval02
LANG eng
TYPE bnews | cts
COND spch | ref
SYS_ID id of the system used (reference data use 'expt')
RUN_ID 1..n (reference data use '1')

Reference Transcripts

The reference transcripts are located in the 'reference' directory. The official format for STT reference data is STM (files with the extension 'stm') while the official format for MDE reference data is RTTM (files with the extension 'rttm') . Files with the extensions 'txt' or 'utf' are the original reference transcripts before any format conversions, additions of annotations, etc. and are included for completeness.

A concatenated version of the reference files has been created for every experiment supported in this evaluation. The concatenated files are in the 'reference/concatenated' directory and are also listed below. Note that for CTS STM reference data, two versions of the STM reference are provided, PemMatched and AllSegments.  NIST used the PemMatched files to perform the official scoring of the RT-02 Evaluation.  The PemMatched files contain a subset of STM segments in the AllSegments files that match the PEM file used for system input.  The AllSegments files, which contains all segments in the evaluation excerpts, are included for completeness and are not to be used to replicate the RT-02 STT scoring procedures.

Task Broadcast News Conversational Telephone Speech
stt1x
expt_02_stt1x_eval02_eng_bnews_spch_expt_1.stm expt_02_stt1x_eval02_eng_cts_spch_expt_1.PemMatched.stm
expt_02_stt1x_eval02_eng_cts_spch_expt_1.AllSegments.stm
stt10x
expt_02_stt10x_eval02_eng_bnews_spch_expt_1.stm expt_02_stt10x_eval02_eng_cts_spch_expt_1.PemMatched.stm
expt_02_stt10x_eval02_eng_cts_spch_expt_1.AllSegments.stm
sttul
expt_02_sttul_eval02_eng_bnews_spch_expt_1.stm expt_02_sttul_eval02_eng_cts_spch_expt_1.PemMatched.stm
expt_02_sttul_eval02_eng_cts_spch_expt_1.AllSegments.stm
spkr

expt_02_spkr_eval02_eng_bnews_spch_expt_1.rttm

N/A

Please note that for this test set, the STM files are identical across the STT tasks (stt1x, stt10x, and sttul).  They are replicated in the three files so that in the future we have the flexibility to build test sets for task containing different file inventories.

Please also note that although in the recent evaluations the speaker diarization task includes the speech+reference condition (in addition to the speech only condition), RT-02 reference data doesn't have the needed word timing information to generate the input data for the speech+reference condition.

System Input Unpartitioned Evaluation Map (UEM) Files

The system input UEM files define regions of the audio that the system must process. They are located in the 'indices' directory.

Task Broadcast News Conversational Telephone Speech
stt1x expt_02_stt1x_eval02_eng_bnews_spch_expt_1.uem expt_02_stt1x_eval02_eng_cts_spch_expt_1.uem
stt10x expt_02_stt10x_eval02_eng_bnews_spch_expt_1.uem expt_02_stt10x_eval02_eng_cts_spch_expt_1.uem
sttul expt_02_sttul_eval02_eng_bnews_spch_expt_1.uem expt_02_sttul_eval02_eng_cts_spch_expt_1.uem
spkr expt_02_spkr_eval02_eng_bnews_spch_expt_1.uem N/A

Partitioned Evaluation Map (PEM) Files

The PEM files contain segmentation information for the evaluation excerpts. BN segments were created using an automatic segmentation utility CMUseg Version 0.5, whereas the CTS segments were created by human annotators. The segments are located in the 'indices' directory.

Task Broadcast News Conversational Telephone Speech
stt1x
expt_02_stt1x_eval02_eng_bnews_spch_expt_1.pem expt_02_stt1x_eval02_eng_cts_spch_expt_1.pem
stt10x
expt_02_stt10x_eval02_eng_bnews_spch_expt_1.pem expt_02_stt10x_eval02_eng_cts_spch_expt_1.pem
sttul
expt_02_sttul_eval02_eng_bnews_spch_expt_1.pem expt_02_sttul_eval02_eng_cts_spch_expt_1.pem

Metadata Scoring Unpartitioned Evaluation Map (UEM) Files

The Metadata scoring UEM files (files with extension 'scr.uem') define scorable regions of the audio file. They are located in the 'reference/concatenated' directory. Note that there is a functional difference between a system input UEM file and a Metadata scoring UEM file. A system input UEM file is used to indicate the regions of the audio that the system must process, whereas the Metadata scoring UEM file is used to indicate the regions that are scorable. Note also that the scoring UEM is used to score the MDE task only and that there is no scoring UEM for the STT tasks.

Task Broadcast News
spkr
expt_02_spkr_eval02_eng_bnews_spch_expt_1.scr.uem

Global Mapping (GLM)

The Global Mapping File (GLM) file used by NIST in RT-02 to normalize the reference and system output transcripts prior to scoring is located in the 'trans_rules' directory in the file trans_rules/en20020429.glm.

Directory Structure

audio/

contains the audio data
indices/ contains the PEM's and UEM's

reference/

contains the reference transcripts
trans_rules/ contains the GLM used in the evaluation