DARPA Resource Management Continuous Speech Database (RM1) Speaker-Independent Training Data NIST Disc 2-3.1 This speech database is a collection of recordings (corpus) of spoken sentences pertaining to a naval resource management task. The corpus consists of two primary components: one component, which was designed for speaker-independent speech recognition research, system development and evaluation, consists of relatively few utterances from each of 160 subjects, while the second component, designed for speaker-dependent speech recognition studies, consists of many utterances from each of 12 speakers. The corpus has been further divided into portions for training, development testing, and final evaluative testing. Subjects read the sentences from written prompts in low background noise. The material was recorded using a Sennheiser HMD 414 headset microphone and simultaneously digitized at a 20 kHz sampling frequency with 16-bit quantization. The digitized speech data was downsampled to 16 kHz and segmented into files corresponding to individual sentence-utterances. Most of these sentence-utterances are approximately 3 to 5 seconds in duration. All of the sentences are consistent with a limited language model that allows queries about ships, ports, etc., along with commands to control a graphics display system, but little else. There is no "official" language model, but a crude (non-probabilistic) word-pair grammar that provides complete coverage of the sentences in this corpus is included. This CD-ROM (NIST disc 2-3.1) comprises all of the speaker-independent training data for this corpus. This disc contains 3360 NIST-headered speech sphere files as well as several documentation files. Sentence text prompts are included but "official" transcriptions (orthographic, phonetic, etc.) do not exist and have, therefore, not been included. For the purpose of system testing, it has been assumed that the prompts represent an accurate orthographic transcription of the utterances. This disc is the second in a series of CD-ROM-based speech and language corpora being prepared by the National Institute of Standards and Technology (NIST) and distributed through the National Technical Information Service (NTIS). Development and preparation of this database was made possible by support from the Defense Advanced Research Projects Agency (DARPA) Information Science and Technology Office. Speaker-Independent Training Material ----------------------------------- This disc contains the following speech data for each of 80 speakers: 2 dialect calibration sentences (files sa[1-2].sph) 40 speaker-independent training sentences (files from sr[001-600].sph and st[0001-2235].sph -- 42 total sentence utterances per speaker CD-ROM Directory Structure -------------------------- The CD-ROMs' directory hierarchy is structured so that a full path/filename uniquely identifies an utterance (the database, data usage, speaker, and sentence). Note: Identical filenames DO EXIST and, therefore, care must be taken when copying files from the CD-ROM to other media. However, if the original path/filename is lost, a file may be disambiguated by the utterance identifier in the header. CAUTION: Eight of the 80 speakers designated for speaker-independent training material have also been used as test speakers in previous DARPA/NIST benchmark tests. These speakers' files have been placed in a separate directory, "excluded", and should NOT be used in speaker-independent system training. The remaining 72 speakers are the "official" DARPA speaker-independent training speakers. The directories are structured as follows: ::= | | ::= //// ::= ind_trn | excluded ::= _ ::= adg0 | ... | wem0 ::= 1 | 2 | ... | 8 ::= sa1.sph | ... | st2235.sph :: = //doc/ ::= documentation files and directories (see below) ::= //readme.txt ::= RM1 Example directories and files: / (CD-ROM root directory) rm1/ (database identification) rm1/ind_trn/ (speaker-independent training data usage) rm1/ind_trn/adg0_4 (speaker "adg0", dialect region "4") . . rm1/ind_trn/wem0_5 (speaker "wem0", dialect region "5") rm1/dep_trn/adg0_4/sa1.sph (speech sphere file containing an utterance . of sentence "sa1") . rm1/ind_trn/adg0_4/st2165.sph (speech sphere file containing an utterance of sentence "st2165" rm1/excluded/ (excluded speakers who have also been used in benchmark tests) rm1/doc/ (online documentation and tables - see below for file descriptions) rm1/readme.txt (this file) Online Documentation -------------------- The following documentation files can be found in the rm1/doc directory: al_sents.snr - Complete listing of all RM Database sentences in SNOR form. SNOR, an acronym for Standard Normalized Orthographic Representation, is a uniform way of writing the words and sentences in this corpus. SNOR-format sentence texts are required as reference material for the DARPA/NIST standard scoring software. al_sents.txt - Complete listing of all RM Database sentences in prompt form. it_scrpt.txt - Table listing the speakers on this disc and the scripts of sentences they read. The scripts are located in the the rm1/doc/scripts directory. it_sents.snr - Listing of sentences on this disc in SNOR form. it_sents.txt - Listing of sentences on this disc in prompt form. it_spkrs.txt - Table describing the speakers on this disc. header.def - NIST header object definitions. lexicon.snr - Complete Resource Management lexicon in SNOR form. not_used.txt - List of valid RM sentences not used in the corpus. (These sentence texts were generated, but never recorded.) scripts/ - Directory containing lists of sentence identifiers. Each list (script) indexes the order of sentences spoken in one recording session by a particular speaker. wp_gram.txt - Resource Management Word-Pair Grammar (developed at BBN). NIST Header Structure --------------------- This series of CD-ROMs employs a new NIST speech file header structure. The header is an object-oriented, 1024-byte fixed-length, entirely ASCII structure. The header is composed of a fixed portion followed by an object-oriented variable portion. The fixed portion is as follows: NIST_1A 1024 The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identify the header as well as allow those who do not wish to read the subsequent header information to programmatically skip over it. The remaining object-oriented variable portion is composed of object-type-value "triple" lines which have the following format: ::= | | | ::= ::= | ::= | ::= _ | _ ::= - | - | - ::= i ::= r ::= s ::= | | (depending on object type) ::= ::= . ::= | NULL ::= ; (excluding embedded new-lines) ::= | ::= | ::= a | ... | z | A | ... | Z ::= | ::= 0 | ... | 9 ::= + | - | NULL ::= | ::= | ::= char(0) | char(1) | ... | char(255) The currently defined objects (used in this database) are listed in the file rm1/doc/header.def. (Note: The list of objects in header.def may be expanded for future databases, since no order or number of objects is imposed on this header structure. The file header.def is simply a repository for "legal" object definitions.) The single object "end_head" marks the end of the active header and the remaining unused header space is undefined. The following is an example header from the Resource Management database: NIST_1A 1024 database_id -s3 RM1 database_version -s3 1.0 utterance_id -s8 adg0_sa1 channel_count -i 1 sample_count -i 50074 sample_rate -i 16000 sample_min -i -2032 sample_max -i 2708 sample_n_bytes -i 2 sample_byte_format -s2 01 sample_sig_bits -i 16 end_head A document will be forthcoming which describes the header structure in greater detail and basic "C" software modules will be available in the future for header interpretation, generation, and manipulation.