DARPA Resource Management Continuous Speech Database (RM1) Isolated- and Spelled-Word Data NIST Speech Disc 2-5.1 June, 1996 This corpus is a collection of recordings of read discrete words, and read spelled words pertaining to a naval resource management task. It provides an isolated-word and spell-mode extension to the (D)ARPA Resource Management (RM1) corpus. The speech data was collected as part of the RM1 Continuous Speech Corpus (NIST Speech Discs 2-1 - 2-4) using the same subjects as were used in RM1, but has until now been unreleased. The Isolated- and Spelled-Word component of RM1 employs a 600-word subset of the original RM1 991-word vocabulary. The lexicon is consistent with a very limited language model that is concerned with ships, ports, etc., along with display adjustments, but little else. And, as with RM1, it contains speaker-independent and speaker-dependent components. The speaker-dependent component consists of the 12 RM1 speaker-dependent subjects each reading: 100 training prompts, 50 development-test prompts, and 50 evaluation-test prompts. The speaker-independent component consists of the 80 RM1 training subjects each reading 15 prompts, the 40 development-test subjects each reading 15 prompts, and 38 of the 40 evaluation-test subjects each reading 15 evaluation-test prompts (the data for 2 of the eval-test subjects "efg0" and "meb0" was unrecoverable). The speaker subsets are identical to those in the original RM1 continuous speech corpus (NIST Speech Discs 2-1 - 2-4) and are encoded with the same speaker ID's. As in the previously-released corpus, the subjects read from prompts in very low background noise. The material was recorded using a Sennheiser SN 414 headset microphone and simultaneously digitized at 20 kHz. into 16-bit samples and then downsampled to 16kHz. Each of the original recordings consisted of a spoken word followed by the spelled word. To facilitate the use of the corpus, NIST has segmented each of the recordings into separate files for the spoken and spelled versions of each of the words. The HTK Toolkit from Entropic Research Laboratory, Inc. was used to produce time-aligned word transcription files. Because these files were produced using machine generated, forced recognition, the time alignments are not as precise as those which could have been generated by a human expert; however, the alignments should be helpful in the analysis and sorting of the speech data - especially in isolating sets of phones or words. There are 9,536* spoken and spelled words on this disc. Each word has a spoken and spelled set of files. The file types included are: - NIST-headered speech waveform (*.wav) - word-level time-aligned transcription (*.wrd) - phone-level time-aligned transcription (*.phn) The disc also contains several documentation files. *Note: Some waveform files were either improperly recorded or unrecoverable. See below for specifics. Speaker-Dependent Material -------------------------- This disc contains the following speaker-dependent material: 2,400 training utterances (12 speakers X 100 words X 2 forms) 1,200 development-test utterances (12 speakers X 50 words X 2 forms) 1,199 evaluation-test utterances* (12 speakers X 50 words X 2 forms) ----- 4,799 total speaker-dependent utterances *(file /jws0/sp414spl.wav was deleted due to speaker error) Speaker-Independent Material ---------------------------- This disc contains the following speaker-independent material: 2,399 training utterances* (80 speakers X 15 words X 2 forms) 1,198 development-test utterances** (40 speakers X 15 words X 2 forms) 1,140 evaluation-test utterances*** (38 speakers X 15 words X 2 forms) ----- 4,737 total speaker-independent utterances *training-speaker file /jrk0/sp586spl.wav was deleted due to speaker error **dev-test speaker file /sah0/sp123*.wav was deleted due to speaker error ***eval-test speakers (the data for speakers EFG0 and MEB0 was unrecoverable) CD-ROM Directory Structure -------------------------- The CD-ROM's directory hierarchy is structured so that a full path/filename uniquely identifies an utterance (the corpus, data usage, mode, speaker, and utterance). The directories are structured as follows: ::= ///// where, ::= RM1 ::= /dep/ | /indep/ ::= /trn/ | /dev/ | /eval/ ::= _ where, ::= abc0 ::= 1 | 2 | ... | 8 ::=
. where, ::= sp001, sp002 ... sp600 (word identifier) ::= spk |\ (spoken word) spl (spelled word) ::= .wav |\ (digitized waveform file) .wrd (time-aligned word-level transcription) .phn (time-aligned phone-level transcription) :: = //doc/ where, ::= documentation files and directories (see below) Example directories and files: / (CD-ROM root directory) /rm1/ (database identification) /rm1/dep/trn/ (speaker-dependent training data usage) /rm1/dep/trn/abc0_3 (speaker "abc0", dialect region "3") . . /rm1/indep/trn/cdf0_7 (speaker "cdf0", dialect region "7") /rm1/dep_trn/cdf0_7/sp001spl.wav (speech waveform file containing an utterance of spelling "sp001") /rm1/doc/ (online documentation and tables - see below for file descriptions) /rm1/readme.txt (this file) Waveform Files -------------- The waveform (*.wav) files are formatted using the NIST SPHERE format and may be manipulated with the NIST UNIX-based SPHERE software (see below). The waveform files contain a 1024-byte header describing the waveform data followed by the data itself. The waveform data is comprised of 16-bit, 16kHz. linear PCM samples. Note that, as with the other RM1 data, this data was originally recorded at 20kHz. and later downsampled to 16kHz. The sample byte order is most significant byte followed by least significant byte. Online Documentation -------------------- The following documentation files can be found in the /rm1/doc directory: al_spkrs.txt - description of speakers rm_splex.txt - list of words spoken/spelled in data with ID's SPHERE ------ The waveform files on this disc are formatted according to the NIST SPeech HEader REsources (SPHERE) specification and may be manipulated using the SPHERE UNIX software tools under the "sphere/" directory. See the file, "sphere/readme.doc" for more information.