DARPA Resource Management Continuous Speech Database (RM1) Speaker-Dependent Training Data NIST Corpus 2-1.1 and 2-2.1 This corpus is a collection of recordings of spoken sentences pertaining to a naval resource management task. It consists of two primary sections: one section, which was designed for speaker-independent recognition evaluation, consists of a few utterances from each of 160 subjects, while the second section, designed for speaker-dependent recognition studies, consists of many utterances from each of only 12 speakers. Subjects read the sentences from written prompts in very low background noise. The material was recorded using a Sennheiser SN 414 headset microphone and simultaneously digitized at 20 Khz. into 16-bit samples. Each recording session was then downsampled to 16 Khz. and segmented into files corresponding to individual sentence-utterances. Most of these sentence-utterances are approximately 3 to 5 seconds in duration. All of the sentences are consistent with a very limited language model that allows queries about ships, ports, etc., along with display adjustments, but little else. There is no "official" language model, but a crude grammar that generates all of the recorded sentences is included in these materials. The corpus has been furthur divided into portions for training, development testing, and final evaluative testing. This two-CD-ROM set (NIST discs 2-1.1 and 2-2.1) comprises all of the speaker-dependent training data for the database. These discs contain 7344 NIST-headered speech sphere files as well as several documentation files. Because all of the speaker-dependent training files would not fit on one CD- ROM, they have been divided so that all of the material for 6 (of the 12) speakers appears on each disc. The online documentation has been duplicated on each disc for your convenience. This set is the first in a series of CD-ROM-based speech and language corpora being prepared by the National Institute of Standards and Technology (NIST) and distributed through the National Technical Information Service (NTIS). Development and preparation of this database was made possible by support from the Defense Advanced Research Projects Agency (DARPA) Information Science and Technology Office. Speaker-Dependent Training Material ----------------------------------- These discs contain the following speech data for each of 12 speakers: 2 dialect calibration sentences (files sa[1-2].sph) 10 rapid adaptation sentences (files sb[01-10].sph) 600 speaker-dependent training sentences (files sx[001-600].sph --- 612 total sentences per speaker CD-ROM Directory Structure -------------------------- The CD-ROMs' directory hierarchy is structured so that a full path/filename uniquely identifies an utterance (the database, data usage, speaker, and sentence). The directories are structured as follows: ::= | | ::= //// ::= dep_trn ::= _ ::= bef0 | ... | tab0 ::= 1 | 2 | ... | 8 ::= sa1.sph | ... | sr600.sph :: = //doc/ ::= documentation files and directories (see below) ::= //readme.txt ::= RM1 Example directories and files: / (CD-ROM root directory) rm1/ (database identification) rm1/dep_trn/ (speaker-dependent training data usage) rm1/dep_trn/bef0_3 (speaker "bef0", dialect region "3") . . rm1/dep_trn/tab0_7 (speaker "tab0", dialect region "7") rm1/dep_trn/bef0_3/sa1.sph (speech sphere file containing an utterance . of sentence "sa1") . rm1/dep_trn/bef0_3/sr600.sph (speech sphere file containing an utterance of sentence "sr600" rm1/doc/ (online documentation and tables - see below for file descriptions) rm1/readme.txt (this file) Online Documentation -------------------- The following documentation files can be found in the rm1/doc directory: al_sents.snr - Complete listing of all RM Database sentences in SNOR form. * al_sents.txt - Complete listing of all RM Database sentences in prompt form. dt_scrpt.txt - Table listing the speakers in this set and the scripts of sentences they spoke. The scripts are located in the the rm1/doc/scripts directory. dt_sents.snr - Listing of sentences on this set SNOR form. * dt_sents.txt - Listing of sentences on this set in prompt form. dt_spkrs.txt - Table describing the speakers on this set. header.def - NIST header object definitions. lexicon.snr - Complete Resource Management lexicon in SNOR form. * not_used.txt - List of valid RM sentences not used in the corpus. scripts/ - Directory containing lists of sentence identifiers. Each list (script) indexes the order of sentences spoken in one recording sessions. wp_gram.txt - BBN's Resource Managment Word-Pair Grammar. * SNOR, short for Standard Normalized Orthographic Representation, is a uniform way of writing English words and sentences. NIST Header Structure --------------------- These (and future) CD-ROMs employ the new NIST speech file header structure. The header is an object-oriented, 1024-byte fixed-length, entirely ASCII structure. The header is composed of a fixed portion followed by a object- oriented variable portion. The fixed portion is as follows: NIST_1A 1024 The first line specifies the header type and the second line specifies the header length. Each of these lines are 8 bytes long (including new-line) and are structured to identifiy the header as well as allow those who do not wish to read the subsequent header information to programatically skip over it. The remaining object-oriented variable portion is composed of object-type- value "triples" which have the following format: ::= ::= | ::= | ::= _ | _ ::= - | - | - ::= i ::= r ::= s ::= | | (depending on object type) ::= ::= . = | ::= | ::= | ::= a | ... | z | A | ... | Z ::= | ::= 0 | ... | 9 ::= + | - | NULL ::= char(0) | char(1) | ... | char(255) The currently defined objects (used in this database) are defined in the file rm1/doc/header.def. (Note: The list of objects in header.def may be expanded for future databases as no order or number of objects is imposed on this header structure. The file header.def is simply a repository for "legal" object definitions.) The single object "end_head" marks the end of the active header and the remaining unused header space is undefined. The following is an example header from the Resource Management database: NIST_1A 1024 database_id -s3 RM1 database_version -s3 1.0 utterance_id -s8 bef0_sa1 channel_count -i 1 sample_count -i 42292 sample_rate -i 16000 sample_min -i -2868 sample_max -i 4015 sample_n_bytes -i 2 sample_byte_format -s2 01 sample_sig_bits -i 16 end_head A document will be forthcoming which describes the header structure in greater detail and basic "C" software modules will be available in the future for header generation an manipulation.