This corpus is a collection of recordings of read discrete words, and read spelled words pertaining to a naval resource management task. It provides an isolated-word and spell-mode extension to the (D)ARPA Resource Management (RM1) corpus. The speech data was collected as part of the RM1 Continuous Speech Corpus (NIST Speech Discs 2-1 - 2-4) using the same subjects as were used in RM1, but has until now been unreleased.
The Isolated- and Spelled-Word component of RM1 employs a 600-word subset of the original RM1 991-word vocabulary. The lexicon is consistent with a very limited language model that is concerned with ships, ports, etc., along with display adjustments, but little else. And, as with RM1, it contains speaker-independent and speaker-dependent components.
The speaker-dependent component consists of the 12 RM1 speaker-dependent subjects each reading: 100 training prompts, 50 development-test prompts, and 50 evaluation-test prompts. The speaker-independent component consists of the 80 RM1 training subjects each reading 15 prompts, the 40 development-test subjects each reading 15 prompts, and 38 of the 40 evaluation-test subjects each reading 15 evaluation-test prompts (the data for 2 of the eval-test subjects "efg0" and "meb0" was unrecoverable). The speaker subsets are identical to those in the original RM1 continuous speech corpus (NIST Speech Discs 2-1 - 2-4) and are encoded with the same speaker ID's.
As in the previously-released corpus, the subjects read from prompts in very low background noise. The material was recorded using a Sennheiser SN 414 headset microphone and simultaneously digitized at 20 kHz. into 16-bit samples and then downsampled to 16kHz. Each of the original recordings consisted of a spoken word followed by the spelled word. To facilitate the use of the corpus, NIST has segmented each of the recordings into separate files for the spoken and spelled versions of each of the words.
The HTK Toolkit from Entropic Research Laboratory, Inc. was used to produce time-aligned word transcription files. Because these files were produced using machine generated, forced recognition, the time alignments are not as precise as those which could have been generated by a human expert; however, the alignments should be helpful in the analysis and sorting of the speech data - especially in isolating sets of phones or words.
There are 9,536* spoken and spelled words on this disc. Each word has a spoken and spelled set of files. The file types included are:
*Note: Some waveform files were either improperly recorded or unrecoverable. See below for specifics.
This disc contains the following speaker-dependent material:
This disc contains the following speaker-independent material:
The CD-ROM's directory hierarchy is structured so that a full path/filename uniquely identifies an utterance (the corpus, data usage, mode, speaker, and utterance). The directories are structured as follows:
::= / / / / / where, ::= RM1 ::= /dep/ | /indep/ ::= /trn/ | /dev/ | /eval/ ::= _ where, ::= abc0 ::= 1 | 2 | ... | 8 ::=
The waveform (*.wav) files are formatted using the NIST SPHERE format and may be manipulated with the NIST UNIX-based SPHERE software (see below). The waveform files contain a 1024-byte header describing the waveform data followed by the data itself. The waveform data is comprised of 16-bit, 16kHz. linear PCM samples. Note that, as with the other RM1 data, this data was originally recorded at 20kHz. and later downsampled to 16kHz. The sample byte order is most significant byte followed by least significant byte.
The following documentation files can be found in the /rm1/doc directory:
The waveform files on this disc are formatted according to the NIST SPeech HEader REsources (SPHERE) specification and may be manipulated using the SPHERE UNIX software tools under the "sphere/" directory. See the file, "sphere/readme.doc" for more information.