Language Recognition Evaluation 2009
(LRE-09) Evaluation Data

Updated: April 28, 2009

(see UPDATE HISTORY section for additional details)

Introduction

This document specifies the content and use of the multi-DVD/Disk Drive set (NIST Speech Disc R124) to be used in the 2009 Language Recognition Evaluation (LRE09) Evaluation administered by the NIST Multimodal Information Group

The LRE09 Evaluation Plan [pdf] document contains the rules and conditions for implementing the LRE09 tests. Read the evaluation plan and this readme carefully before beginning a test. Sections 6.2.1 and 6.3 of the evaluation plan describe the instructions for submitting your system output to NIST for scoring. To ensure that your submission is properly logged and scored, please follow those instructions carefully. 

Evaluation Test Data

The LRE09 Evaluation Data is composed of Voice of America and Mixer 3 data licensed through the Linguistic Data Consortium.  Participants must have signed copies of the The LRE09 Evaluation Participation Agreement and completed the LDC licensing agreement prior to using this evaluation test set.

The evaluation data is distributed on two DVDs. There is a top-level directory denoted, for consistency with past practice, lre09e1, and used as a unique label for the disc set. The data structure for each disc is as follows:

/lre09e1/seg.ndx -This file contains the list of the test segments to be used in all of the tests. This file is an ASCII record format file. Each record contains just a single field, namely the test segment relative path/file name.

/lre09e1/data/ -The data directory contains all the speech data test segments, broken into 5 sub-directories to limit the number of files contained in a single directory. Each test segment is an 8-bit, 8-kHz, mu-law, SPHERE format speech data file. The names of these files are pseudo-random alphanumeric strings, followed by .sph.

CONTACT INFORMATION

If you have questions regarding this evaluation or this document or if you are interested in participating in future NIST speech recognition tests, send an email to our staff.

UPDATE HISTORY


CAVEAT

Certain commercial equipment, instruments, software, and materials may be identified on this CD-ROM in order to adequately specify experimental procedures used. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology (NIST), nor does it imply that the equipment, instruments, software, or materials identified are necessarily the best available for the purpose.