2002 Rich Transcription Broadcast News and Conversational Telephone Speech

Introduction

This corpus contains the test material used in the 2002 Rich Transcription (RT-02) Evaluation of Broadcast News and Conversational Telephone Speech, administered by the NIST Speech Group in the Spring of 2002. The RT-02 Meeting Recognition Evaluation material is available in a separate distribution.

For complete up-to-date information, see the RT-02 Evaluation Website.

Instructions

To replicate the RT-02 evaluation tests, see the RT-02 Evaluation Plan as it contains the rules and conditions for implementing the RT-02 evaluation tests. Self-scoring instructions are given in the Self-Scoring section below.

Evaluation Tasks

The RT-02 Evaluation supported two main evaluation tasks:

Speech-To-Text (STT) Tasks -- included three processing speeds (1x real time, 10x real time, and unlimited time) for both the Broadcast News (BN) and Conversational Telephone Speech (CTS) domains.
Metadata Extraction (MDE) Task -- consisted of a speaker diarization task for the BN and CTS domains.

The speaker diarization task as defined for the RT-02 Evaluation was significantly different than the diarization tasks evaluated in RT-03 and RT-04. The original RT-02 speaker diarization reference data has been updated to reflect the current practices so that comparisons can be made with the more recent evaluations. Note that the speaker diarization task in the CTS domain is no longer supported because it was deemed to be degenerate. Thus, only the speaker diarization reference in the BN domain is included in this distribution. For additional information on the speaker diarization task as defined in recent evaluations, please refer to the RT-04 Fall Evaluation Plan which can be found at the RT-04 Fall Evaluation Website.

Data Description

This distribution of the RT-02 Evaluation Data contains only Broadcast News and Conversational Telephone Speech data. Meeting data used in the RT-02 Evaluation is not included in this distribution and is packaged in a separate distribution. All recordings are in English.

Broadcast News (BN) Data

The BN data is composed of six, approximately 10-minute excerpts from six different broadcasts. The broadcasts were selected from sources collected in December 14-19, 1998. The evaluation excerpts were transcribed to the nearest story boundary. The table below lists the audio files making up the BN portion of this distribution.

Broadcast News Audio

bn02en_1.sph

bn02en_3.sph

bn02en_5.sph

bn02en_2.sph

bn02en_4.sph

bn02en_6.sph

For the RT-02 STT BN evaluation task, no manual (human-annotated) segmentations were provided. Sites were required to generate their own segmentations automatically or use the automatically generated segmentations provided in the PEM files located in the 'indices' directory. The provided segmentations were generated using the CMU automatic segmentation and classification utility CMUseg Version 0.5. The CMUseg utility has been graciously supplied to the DARPA community by Carnegie Mellon University for use as a common acoustic segmentation utility. Participants were not required to use this segmentation or the CMUseg utility. Sites were free to use any segmentation scheme of their choice. These segmentations were provided for the convenience of sites that did not have access to segmentation algorithms. Note: PEM files were not intended for use in the Metadata Extraction (MDE) tasks since the task itself required extension of speaker segmentation algorithms.

The audio files of the complete broadcasts are located in the 'audio' directory. Each waveform is a SPHERE-headered, single-channel, 16-bit PCM file. Although the entire audio file is provided and available for within broadcast adaptation, only the excerpts listed in the system input UEM files located in the 'indices' directory are included in the evaluation.

Broadcast News Audio
bn02en_1.sph	bn02en_3.sph	bn02en_5.sph
bn02en_2.sph	bn02en_4.sph	bn02en_6.sph

The Conversational Telephone Speech (CTS) Data

The CTS data is composed of 60, approximately 5-minute excerpts from 60 different conversations: 20 from Switchboard I data, 20 from Switchboard Phase II data, and 20 from Switchboard Cellular Phase II data. Evaluation excerpts were transcribed to the nearest turn. The table below lists the audio files and their sources making up the CTS portion of this distribution.

Conversational Telephone Speech Audio

Switchboard I Switchboard Phase II Switchboard Cellular Phase II

sw4386.sph

sw_30016.sph

sw_45063.sph

sw4394.sph

sw_30223.sph

sw_45147.sph

sw4398.sph

sw_30352.sph

sw_45187.sph

sw4409.sph

sw_30410.sph

sw_45255.sph

sw4477.sph

sw_30751.sph

sw_45284.sph

sw4490.sph

sw_30801.sph

sw_45458.sph

sw4528.sph

sw_30849.sph

sw_45501.sph

sw4535.sph

sw_30861.sph

sw_45734.sph

sw4536.sph

sw_30969.sph

sw_45939.sph

sw4578.sph

sw_30986.sph

sw_46098.sph

sw4627.sph

sw_31032.sph

sw_46312.sph

sw4634.sph

sw_31131.sph

sw_46387.sph

sw4639.sph

sw_31195.sph

sw_46516.sph

sw4653.sph

sw_31388.sph

sw_46667.sph

sw4705.sph

sw_31483.sph

sw_46715.sph

sw4730.sph

sw_31493.sph

sw_47205.sph

sw4755.sph

sw_31572.sph

sw_47435.sph

sw4806.sph

sw_31585.sph

sw_47566.sph

sw4851.sph

sw_32163.sph

sw_47610.sph

sw4866.sph

sw_39999.sph

sw_47620.sph

For the RT-02 STT CTS, systems were given manual segment times to decode. These segments are identified in the PEM files in the 'indices' directory.

Unlike the BN audio files where the full broadcasts were provided, the CTS audio files contain only the evaluation excerpts. Each audio excerpt is a SPHERE-headered, two channel interleaved 8-bit mulaw file. Echo cancellation was performed on the CTS data using the echo cancellation software developed by Mississippi State.

Conversational Telephone Speech Audio
Switchboard I	Switchboard Phase II	Switchboard Cellular Phase II
sw4386.sph	sw_30016.sph	sw_45063.sph
sw4394.sph	sw_30223.sph	sw_45147.sph
sw4398.sph	sw_30352.sph	sw_45187.sph
sw4409.sph	sw_30410.sph	sw_45255.sph
sw4477.sph	sw_30751.sph	sw_45284.sph
sw4490.sph	sw_30801.sph	sw_45458.sph
sw4528.sph	sw_30849.sph	sw_45501.sph
sw4535.sph	sw_30861.sph	sw_45734.sph
sw4536.sph	sw_30969.sph	sw_45939.sph
sw4578.sph	sw_30986.sph	sw_46098.sph
sw4627.sph	sw_31032.sph	sw_46312.sph
sw4634.sph	sw_31131.sph	sw_46387.sph
sw4639.sph	sw_31195.sph	sw_46516.sph
sw4653.sph	sw_31388.sph	sw_46667.sph
sw4705.sph	sw_31483.sph	sw_46715.sph
sw4730.sph	sw_31493.sph	sw_47205.sph
sw4755.sph	sw_31572.sph	sw_47435.sph
sw4806.sph	sw_31585.sph	sw_47566.sph
sw4851.sph	sw_32163.sph	sw_47610.sph
sw4866.sph	sw_39999.sph	sw_47620.sph

Self-Scoring

To self-score your system STT output, you will need to download three software packages from NIST: Speech Recognition Scoring Toolkit, Transcription Filtering Package, and HUB-Score Script by following the steps given below:

Go to http://www.nist.gov/speech/tools
Download and install the Speech Recognition Scoring Toolkit

Choose the sctk-XX package, where XX stands for the current version

Download and install the Transcription Filtering Package

Choose the tranfilt-XX package, where XX stands for the current version

Download the HUB-Score Script
- Choose the hubscrXX script, where XX stands for the current version
- Make sure that hubscr script has the correct paths to 'sctk' and 'tranfilt'
Run hubscr script with the appropriate arguments, i.e.,
hubscr05.pl -g -h hub4 -l english -r ref_filename.stm <hyp_filename1.ctm> <hyp_filename2.ctm> ...

To self-score your system MDE speaker diarization output, you will need to download the MDE scoring script by following the steps given below:

Go http://www.nist.gov/speech/tests/rt/rt2004/fall
Download the MDE scoring script
- Choose md-eval, XX where XX stands for the current version
Run md-eval script with the appropriate arguments, i.e.,
md-eval-v16.pl -1 -c 0.25 -u medata_scoring_uem_filename.scr.uem -r ref_filename.rttm -s hyp_filename.rttm

Experiment ID (EXP-ID)

Concatenated reference, system input UEM, Metadata scoring UEM, and PEM files are named using an Experiment ID (EXP-ID) code, where

EXP-ID = <SITE_ID>_<YEAR>_<TASK>_<DATA>_<LANG>_<TYPE>_<COND>_<SYS_ID>_<RUN_ID>

SITE_ID	id of the participating site (reference data use 'expt')
YEAR	02
TASK	stt1x \| stt10x \| sttul \| spkr
DATA	eval02
LANG	eng
TYPE	bnews \| cts
COND	spch \| ref
SYS_ID	id of the system used (reference data use 'expt')
RUN_ID	1..n (reference data use '1')

Reference Transcripts

The reference transcripts are located in the 'reference' directory. The official format for STT reference data is STM (files with the extension 'stm') while the official format for MDE reference data is RTTM (files with the extension 'rttm') . Files with the extensions 'txt' or 'utf' are the original reference transcripts before any format conversions, additions of annotations, etc. and are included for completeness.

A concatenated version of the reference files has been created for every experiment supported in this evaluation. The concatenated files are in the 'reference/concatenated' directory and are also listed below. Note that for CTS STM reference data, two versions of the STM reference are provided, PemMatched and AllSegments. NIST used the PemMatched files to perform the official scoring of the RT-02 Evaluation. The PemMatched files contain a subset of STM segments in the AllSegments files that match the PEM file used for system input. The AllSegments files, which contains all segments in the evaluation excerpts, are included for completeness and are not to be used to replicate the RT-02 STT scoring procedures.

Task	Broadcast News	Conversational Telephone Speech
stt1x	expt_02_stt1x_eval02_eng_bnews_spch_expt_1.stm	expt_02_stt1x_eval02_eng_cts_spch_expt_1.PemMatched.stm
stt1x	expt_02_stt1x_eval02_eng_bnews_spch_expt_1.stm	expt_02_stt1x_eval02_eng_cts_spch_expt_1.AllSegments.stm
stt10x	expt_02_stt10x_eval02_eng_bnews_spch_expt_1.stm	expt_02_stt10x_eval02_eng_cts_spch_expt_1.PemMatched.stm
stt10x	expt_02_stt10x_eval02_eng_bnews_spch_expt_1.stm	expt_02_stt10x_eval02_eng_cts_spch_expt_1.AllSegments.stm
sttul	expt_02_sttul_eval02_eng_bnews_spch_expt_1.stm	expt_02_sttul_eval02_eng_cts_spch_expt_1.PemMatched.stm
sttul	expt_02_sttul_eval02_eng_bnews_spch_expt_1.stm	expt_02_sttul_eval02_eng_cts_spch_expt_1.AllSegments.stm
spkr	expt_02_spkr_eval02_eng_bnews_spch_expt_1.rttm	N/A

Please note that for this test set, the STM files are identical across the STT tasks (stt1x, stt10x, and sttul). They are replicated in the three files so that in the future we have the flexibility to build test sets for task containing different file inventories.

Please also note that although in the recent evaluations the speaker diarization task includes the speech+reference condition (in addition to the speech only condition), RT-02 reference data doesn't have the needed word timing information to generate the input data for the speech+reference condition.

System Input Unpartitioned Evaluation Map (UEM) Files

The system input UEM files define regions of the audio that the system must process. They are located in the 'indices' directory.

Task	Broadcast News	Conversational Telephone Speech
stt1x	expt_02_stt1x_eval02_eng_bnews_spch_expt_1.uem	expt_02_stt1x_eval02_eng_cts_spch_expt_1.uem
stt10x	expt_02_stt10x_eval02_eng_bnews_spch_expt_1.uem	expt_02_stt10x_eval02_eng_cts_spch_expt_1.uem
sttul	expt_02_sttul_eval02_eng_bnews_spch_expt_1.uem	expt_02_sttul_eval02_eng_cts_spch_expt_1.uem
spkr	expt_02_spkr_eval02_eng_bnews_spch_expt_1.uem	N/A

Partitioned Evaluation Map (PEM) Files

The PEM files contain segmentation information for the evaluation excerpts. BN segments were created using an automatic segmentation utility CMUseg Version 0.5, whereas the CTS segments were created by human annotators. The segments are located in the 'indices' directory.

Task	Broadcast News	Conversational Telephone Speech
stt1x	expt_02_stt1x_eval02_eng_bnews_spch_expt_1.pem	expt_02_stt1x_eval02_eng_cts_spch_expt_1.pem
stt1x	expt_02_stt1x_eval02_eng_bnews_spch_expt_1.pem	expt_02_stt1x_eval02_eng_cts_spch_expt_1.pem
stt10x	expt_02_stt10x_eval02_eng_bnews_spch_expt_1.pem	expt_02_stt10x_eval02_eng_cts_spch_expt_1.pem
stt10x	expt_02_stt10x_eval02_eng_bnews_spch_expt_1.pem	expt_02_stt10x_eval02_eng_cts_spch_expt_1.pem
sttul	expt_02_sttul_eval02_eng_bnews_spch_expt_1.pem	expt_02_sttul_eval02_eng_cts_spch_expt_1.pem
sttul	expt_02_sttul_eval02_eng_bnews_spch_expt_1.pem	expt_02_sttul_eval02_eng_cts_spch_expt_1.pem

Metadata Scoring Unpartitioned Evaluation Map (UEM) Files

The Metadata scoring UEM files (files with extension 'scr.uem') define scorable regions of the audio file. They are located in the 'reference/concatenated' directory. Note that there is a functional difference between a system input UEM file and a Metadata scoring UEM file. A system input UEM file is used to indicate the regions of the audio that the system must process, whereas the Metadata scoring UEM file is used to indicate the regions that are scorable. Note also that the scoring UEM is used to score the MDE task only and that there is no scoring UEM for the STT tasks.

Task	Broadcast News
spkr	expt_02_spkr_eval02_eng_bnews_spch_expt_1.scr.uem
spkr	expt_02_spkr_eval02_eng_bnews_spch_expt_1.scr.uem

Global Mapping (GLM)

The Global Mapping File (GLM) file used by NIST in RT-02 to normalize the reference and system output transcripts prior to scoring is located in the 'trans_rules' directory in the file trans_rules/en20020429.glm.

Directory Structure

audio/	contains the audio data
indices/	contains the PEM's and UEM's
reference/	contains the reference transcripts
trans_rules/	contains the GLM used in the evaluation