This document specifies the content and use of the multi-DVD/Disk
Drive set
(NIST Speech Disc R124) to be used in the
2009 Language Recognition Evaluation
(LRE09) Evaluation administered by the
NIST Multimodal Information Group
The LRE09 Evaluation
Plan [pdf]
document contains the rules and conditions for implementing the LRE09
tests. Read
the evaluation plan and this readme carefully before beginning a test.
Sections 6.2.1 and 6.3
of the evaluation plan describe the instructions for submitting your
system output to NIST for scoring. To ensure that your submission is
properly logged and scored, please follow those instructions
carefully.
The LRE09 Evaluation Data is composed of Voice of America and Mixer 3
data licensed through the Linguistic Data Consortium. Participants must have
signed copies of the The
LRE09 Evaluation Participation Agreement and completed the LDC
licensing agreement prior to using this evaluation test
set.
The evaluation data is distributed on two DVDs. There is a top-level directory denoted, for consistency with past practice, lre09e1, and used as a unique label for the disc set. The data structure for each disc is as follows:
/lre09e1/seg.ndx -This file contains the list of the test segments to be used in all of the tests. This file is an ASCII record format file. Each record contains just a single field, namely the test segment relative path/file name.
/lre09e1/data/ -The data directory contains all the speech data test segments, broken into 5 sub-directories to limit the number of files contained in a single directory. Each test segment is an 8-bit, 8-kHz, mu-law, SPHERE format speech data file. The names of these files are pseudo-random alphanumeric strings, followed by .sph.