This corpus contains the test material used in the 2002 Rich Transcription (RT-02) Evaluation of Broadcast News and Conversational Telephone Speech, administered by the NIST Speech Group in the Spring of 2002. The RT-02 Meeting Recognition Evaluation material is available in a separate distribution.
For complete up-to-date information, see the RT-02 Evaluation Website.
To replicate the RT-02 evaluation tests, see the RT-02 Evaluation Plan as it contains the rules and conditions for implementing the RT-02 evaluation tests. Self-scoring instructions are given in the Self-Scoring section below.
The RT-02 Evaluation supported two main evaluation tasks:
The speaker diarization task as defined for the RT-02 Evaluation was significantly different than the diarization tasks evaluated in RT-03 and RT-04. The original RT-02 speaker diarization reference data has been updated to reflect the current practices so that comparisons can be made with the more recent evaluations. Note that the speaker diarization task in the CTS domain is no longer supported because it was deemed to be degenerate. Thus, only the speaker diarization reference in the BN domain is included in this distribution. For additional information on the speaker diarization task as defined in recent evaluations, please refer to the RT-04 Fall Evaluation Plan which can be found at the RT-04 Fall Evaluation Website.
This distribution of the RT-02 Evaluation Data contains only Broadcast News and Conversational Telephone Speech data. Meeting data used in the RT-02 Evaluation is not included in this distribution and is packaged in a separate distribution. All recordings are in English.
Broadcast News (BN) Data
The BN data is composed of six, approximately 10-minute excerpts from six different broadcasts. The broadcasts were selected from sources collected in December 14-19, 1998. The evaluation excerpts were transcribed to the nearest story boundary. The table below lists the audio files making up the BN portion of this distribution.
Broadcast News Audio bn02en_1.sph bn02en_3.sph bn02en_5.sph bn02en_2.sph bn02en_4.sph bn02en_6.sphFor the RT-02 STT BN evaluation task, no manual (human-annotated) segmentations were provided. Sites were required to generate their own segmentations automatically or use the automatically generated segmentations provided in the PEM files located in the 'indices' directory. The provided segmentations were generated using the CMU automatic segmentation and classification utility CMUseg Version 0.5. The CMUseg utility has been graciously supplied to the DARPA community by Carnegie Mellon University for use as a common acoustic segmentation utility. Participants were not required to use this segmentation or the CMUseg utility. Sites were free to use any segmentation scheme of their choice. These segmentations were provided for the convenience of sites that did not have access to segmentation algorithms. Note: PEM files were not intended for use in the Metadata Extraction (MDE) tasks since the task itself required extension of speaker segmentation algorithms.
The audio files of the complete broadcasts are located in the 'audio' directory. Each waveform is a SPHERE-headered, single-channel, 16-bit PCM file. Although the entire audio file is provided and available for within broadcast adaptation, only the excerpts listed in the system input UEM files located in the 'indices' directory are included in the evaluation.
The Conversational Telephone Speech (CTS) Data
The CTS data is composed of 60, approximately 5-minute excerpts from 60 different conversations: 20 from Switchboard I data, 20 from Switchboard Phase II data, and 20 from Switchboard Cellular Phase II data. Evaluation excerpts were transcribed to the nearest turn. The table below lists the audio files and their sources making up the CTS portion of this distribution.
Conversational Telephone Speech Audio Switchboard I Switchboard Phase II Switchboard Cellular Phase II sw4386.sph sw_30016.sph sw_45063.sph sw4394.sph sw_30223.sph sw_45147.sph sw4398.sph sw_30352.sph sw_45187.sph sw4409.sph sw_30410.sph sw_45255.sph sw4477.sph sw_30751.sph sw_45284.sph sw4490.sph sw_30801.sph sw_45458.sph sw4528.sph sw_30849.sph sw_45501.sph sw4535.sph sw_30861.sph sw_45734.sph sw4536.sph sw_30969.sph sw_45939.sph sw4578.sph sw_30986.sph sw_46098.sph sw4627.sph sw_31032.sph sw_46312.sph sw4634.sph sw_31131.sph sw_46387.sph sw4639.sph sw_31195.sph sw_46516.sph sw4653.sph sw_31388.sph sw_46667.sph sw4705.sph sw_31483.sph sw_46715.sph sw4730.sph sw_31493.sph sw_47205.sph sw4755.sph sw_31572.sph sw_47435.sph sw4806.sph sw_31585.sph sw_47566.sph sw4851.sph sw_32163.sph sw_47610.sph sw4866.sph sw_39999.sph sw_47620.sphFor the RT-02 STT CTS, systems were given manual segment times to decode. These segments are identified in the PEM files in the 'indices' directory.
Unlike the BN audio files where the full broadcasts were provided, the CTS audio files contain only the evaluation excerpts. Each audio excerpt is a SPHERE-headered, two channel interleaved 8-bit mulaw file. Echo cancellation was performed on the CTS data using the echo cancellation software developed by Mississippi State.
To self-score your system STT output, you will need to download three software packages from NIST: Speech Recognition Scoring Toolkit, Transcription Filtering Package, and HUB-Score Script by following the steps given below:
To self-score your system MDE speaker diarization output, you will need to download the MDE scoring script by following the steps given below:
Concatenated reference, system input UEM, Metadata scoring UEM, and PEM files are named using an Experiment ID (EXP-ID) code, where
EXP-ID = <SITE_ID>_<YEAR>_<TASK>_<DATA>_<LANG>_<TYPE>_<COND>_<SYS_ID>_<RUN_ID>
SITE_ID | id of the participating site (reference data use 'expt') |
YEAR | 02 |
TASK | stt1x | stt10x | sttul | spkr |
DATA | eval02 |
LANG | eng |
TYPE | bnews | cts |
COND | spch | ref |
SYS_ID | id of the system used (reference data use 'expt') |
RUN_ID | 1..n (reference data use '1') |
The reference transcripts are located in the 'reference'
directory. The official format for STT reference data is STM
(files with the extension 'stm') while the official format for MDE reference
data is RTTM (files with the extension 'rttm') . Files with the extensions 'txt'
or 'utf' are the original reference transcripts before any format conversions,
additions of annotations, etc. and are included for completeness.
A concatenated version of the reference files has been created for every experiment
supported in this evaluation. The concatenated files are in the 'reference/concatenated'
directory and are also listed below. Note that for CTS STM reference data, two
versions of the STM reference are provided, PemMatched and AllSegments.
NIST used the PemMatched files to perform the official scoring of the
RT-02 Evaluation. The PemMatched files contain a subset of STM
segments in the AllSegments files that match the PEM file used
for system input. The AllSegments files, which contains all segments
in the evaluation excerpts, are included for completeness and are not to be
used to replicate the RT-02 STT scoring procedures.
Please note that for this test set, the STM files are identical across the STT tasks (stt1x, stt10x, and sttul). They are replicated in the three files so that in the future we have the flexibility to build test sets for task containing different file inventories.
Please also note that although in the recent evaluations the speaker diarization
task includes the speech+reference condition (in addition to the speech only
condition), RT-02 reference data doesn't have the needed word timing information
to generate the input data for the speech+reference condition.
The system input UEM files define regions of the audio that the system must process. They are located in the 'indices' directory.
Task | Broadcast News | Conversational Telephone Speech |
---|---|---|
stt1x | expt_02_stt1x_eval02_eng_bnews_spch_expt_1.uem | expt_02_stt1x_eval02_eng_cts_spch_expt_1.uem |
stt10x | expt_02_stt10x_eval02_eng_bnews_spch_expt_1.uem | expt_02_stt10x_eval02_eng_cts_spch_expt_1.uem |
sttul | expt_02_sttul_eval02_eng_bnews_spch_expt_1.uem | expt_02_sttul_eval02_eng_cts_spch_expt_1.uem |
spkr | expt_02_spkr_eval02_eng_bnews_spch_expt_1.uem | N/A |
The PEM files contain segmentation information for the evaluation excerpts. BN segments were created using an automatic segmentation utility CMUseg Version 0.5, whereas the CTS segments were created by human annotators. The segments are located in the 'indices' directory.
Task | Broadcast News | Conversational Telephone Speech |
---|---|---|
stt1x
|
expt_02_stt1x_eval02_eng_bnews_spch_expt_1.pem | expt_02_stt1x_eval02_eng_cts_spch_expt_1.pem |
stt10x
|
expt_02_stt10x_eval02_eng_bnews_spch_expt_1.pem | expt_02_stt10x_eval02_eng_cts_spch_expt_1.pem |
sttul
|
expt_02_sttul_eval02_eng_bnews_spch_expt_1.pem | expt_02_sttul_eval02_eng_cts_spch_expt_1.pem |
The Metadata scoring UEM files (files with extension 'scr.uem') define scorable regions of the audio file. They are located in the 'reference/concatenated' directory. Note that there is a functional difference between a system input UEM file and a Metadata scoring UEM file. A system input UEM file is used to indicate the regions of the audio that the system must process, whereas the Metadata scoring UEM file is used to indicate the regions that are scorable. Note also that the scoring UEM is used to score the MDE task only and that there is no scoring UEM for the STT tasks.
Task | Broadcast News |
---|---|
spkr
|
expt_02_spkr_eval02_eng_bnews_spch_expt_1.scr.uem |
The Global Mapping File (GLM) file used by NIST in RT-02 to normalize the reference and system output transcripts prior to scoring is located in the 'trans_rules' directory in the file trans_rules/en20020429.glm.
contains the audio data | |
indices/ | contains the PEM's and UEM's |
contains the reference transcripts | |
trans_rules/ | contains the GLM used in the evaluation |