DARPA Resource Management Continuous Speech Database
				(RM1)
		   Isolated- and Spelled-Word Data
			NIST Speech Disc 2-5.1

			     June, 1996
 

This corpus is a collection of recordings of read discrete words, and
read spelled words pertaining to a naval resource management task.  It
provides an isolated-word and spell-mode extension to the (D)ARPA
Resource Management (RM1) corpus.  The speech data was collected as
part of the RM1 Continuous Speech Corpus (NIST Speech Discs 2-1 - 2-4)
using the same subjects as were used in RM1, but has until now been
unreleased.

The Isolated- and Spelled-Word component of RM1 employs a 600-word
subset of the original RM1 991-word vocabulary.  The lexicon is
consistent with a very limited language model that is concerned with
ships, ports, etc., along with display adjustments, but little else.
And, as with RM1, it contains speaker-independent and
speaker-dependent components.

The speaker-dependent component consists of the 12 RM1
speaker-dependent subjects each reading: 100 training prompts, 50
development-test prompts, and 50 evaluation-test prompts.  The
speaker-independent component consists of the 80 RM1 training subjects
each reading 15 prompts, the 40 development-test subjects each reading
15 prompts, and 38 of the 40 evaluation-test subjects each reading 15
evaluation-test prompts (the data for 2 of the eval-test subjects
"efg0" and "meb0" was unrecoverable).  The speaker subsets are
identical to those in the original RM1 continuous speech corpus (NIST
Speech Discs 2-1 - 2-4) and are encoded with the same speaker ID's.

As in the previously-released corpus, the subjects read from prompts
in very low background noise.  The material was recorded using a
Sennheiser SN 414 headset microphone and simultaneously digitized at
20 kHz. into 16-bit samples and then downsampled to 16kHz.  Each of
the original recordings consisted of a spoken word followed by the
spelled word.  To facilitate the use of the corpus, NIST has segmented
each of the recordings into separate files for the spoken and spelled
versions of each of the words.

The HTK Toolkit from Entropic Research Laboratory, Inc. was used to
produce time-aligned word transcription files.  Because these files
were produced using machine generated, forced recognition, the time 
alignments are not as precise as those which could have been generated 
by a human expert; however, the alignments should be helpful in the 
analysis and sorting of the speech data - especially in isolating sets
of phones or words.

There are 9,536* spoken and spelled words on this disc.  Each word 
has a spoken and spelled set of files.  The file types included are:
	- NIST-headered speech waveform (*.wav)
	- word-level time-aligned transcription (*.wrd)
	- phone-level time-aligned transcription (*.phn) 

The disc also contains several documentation files.

*Note: Some waveform files were either improperly recorded or
unrecoverable.  See below for specifics.


Speaker-Dependent Material
--------------------------
This disc contains the following speaker-dependent material:

   2,400 training utterances (12 speakers X 100 words X 2 forms)
   1,200 development-test utterances (12 speakers X 50 words X 2 forms)
   1,199 evaluation-test utterances* (12 speakers X 50 words X 2 forms)
   -----
   4,799 total speaker-dependent utterances

*(file /jws0/sp414spl.wav was deleted due to speaker error)


Speaker-Independent Material
----------------------------
This disc contains the following speaker-independent material:

   2,399 training utterances* (80 speakers X 15 words X 2 forms)
   1,198 development-test utterances** (40 speakers X 15 words X 2 forms)
   1,140 evaluation-test utterances*** (38 speakers X 15 words X 2 forms)
   -----
   4,737 total speaker-independent utterances

*training-speaker file /jrk0/sp586spl.wav was deleted due to speaker error
**dev-test speaker file /sah0/sp123*.wav was deleted due to speaker error
***eval-test speakers (the data for speakers EFG0 and MEB0 was unrecoverable)


CD-ROM Directory Structure
--------------------------
The CD-ROM's directory hierarchy is structured so that a full path/filename
uniquely identifies an utterance (the corpus, data usage, mode, speaker, and
utterance).  The directories are structured as follows:


<CORPUS-FILE-SPEC> ::= /<CORPUS>/<USAGE>/<MODE>/<SPEAKER_DIR>/<DATAFILE>

where,

    <CORPUS> ::= RM1

    <USAGE> ::= /dep/ | /indep/

    <MODE> ::= /trn/ | /dev/ | /eval/ 

    <SPEAKER_DIR> ::= <SPEAKER_ID>_<DIALECT>

    where,

      <SPEAKER_ID> ::= abc0
      <DIALECT> ::= 1 | 2 | ... | 8

    <DATAFILE> ::= <WORD_ID><FORM>.<FILETYPE>
    
    where,

      <WORD_ID> ::= sp001, sp002 ... sp600 (word identifier)
      <FORM> ::= spk |\  (spoken word)
                 spl     (spelled word)

      <FILETYPE> ::= .wav |\  (digitized waveform file)
                     .wrd     (time-aligned word-level transcription)
		     .phn     (time-aligned phone-level transcription)

<DOC-FILE-SPEC> :: = /<CORPUS>/doc/<DOCFILE>

where,
    <DOCFILE> ::= documentation files and directories (see below)


Example directories and files:

/                             (CD-ROM root directory)
/rm1/                         (database identification)
/rm1/dep/trn/                 (speaker-dependent training data usage)
/rm1/dep/trn/abc0_3           (speaker "abc0", dialect region "3")
     .
     .
/rm1/indep/trn/cdf0_7         (speaker "cdf0", dialect region "7")
/rm1/dep_trn/cdf0_7/sp001spl.wav  (speech waveform file containing an utterance
                                   of spelling "sp001")

/rm1/doc/                     (online documentation and tables - see below
                               for file descriptions)
/rm1/readme.txt               (this file)


Waveform Files
--------------

The waveform (*.wav) files are formatted using the NIST SPHERE format
and may be manipulated with the NIST UNIX-based SPHERE software (see
below).  The waveform files contain a 1024-byte header describing the
waveform data followed by the data itself.  The waveform data is
comprised of 16-bit, 16kHz. linear PCM samples.  Note that, as with
the other RM1 data, this data was originally recorded at 20kHz. and
later downsampled to 16kHz.  The sample byte order is most significant
byte followed by least significant byte.


Online Documentation
--------------------
The following documentation files can be found in the /rm1/doc directory:

     al_spkrs.txt - description of speakers     
     rm_splex.txt - list of words spoken/spelled in data with ID's


SPHERE
------
The waveform files on this disc are formatted according to the NIST
SPeech HEader REsources (SPHERE) specification and may be manipulated
using the SPHERE UNIX software tools under the "sphere/" directory.
See the file, "sphere/readme.doc" for more information.