Brief Description of the KING Speech Data Base The KING corpus was created for research in the area of speaker identification. It was collected partly in New Jersey and partly in San Diego. There are twenty-six San Diego speakers (numbered 01 to 26) and twenty-five New Jersey speakers (numbered 27 to 60, with some gaps in the sequence). All speakers are male. There are ten sessions for each speaker (numbered 01 to 10), and each session was recorded in both a wide-band (wb) and a narrow-band (nb) channel. Sessions were recorded a week to a month apart. The two channels are stored in separate, single-channel waveform files, under the "wb" and "nb" directories, respectively. The narrow-band channel represents speech that was passed through a standard telephone handset, transmitted through a local telephone exchange to a long distance service and back to the local exchange, then recorded from an analog telephone patch. The wide-band channel represents the same utterance, recorded using a high-quality microphone that was mounted on the telephone handset; recording was done in a quiet room. Both channels are digitized at 8 KHz with 16-bit linear samples. A more detailed description of the collection protocol is provided in the file "collectn.doc". Within each channel directory, there are subdirectories for each recording session (named "s01" through "s10"), and these contain separate waveform files for each speaker (51 files per session directory). Unfortunately, there are some gaps in the wide-band set: there is no data for speakers 13 and 25 in sessions 06 through 10, so these 5 session directories contain only 49 files each. (There are no gaps in narrow-band directories.) Each speaker/session waveform file consists of about thirty seconds of actual speech of the person speaking on one of the following assigned topics 1 Construction toy task 2 Describing odd shapes 3 Topic of speaker's choice 4 Road rally task 5 Describing photographs 6 Describing cartoon strips In addition to the waveform data, there are text files that contain the orthographic transcriptions of each session for each speaker. These are in the "transcrp" directory, stored in a directory structure that is parallel to the speech data. The individual file names identify the channel, session, speaker and task, in the following pattern: CSS_MM_T.wav where "C" is "w" for wide-band, "n" for narrow-band or "x" for transcription (one transcript file applies to both channels of speech); "SS" is the two-digit session id, "MM" is the two-digit speaker id, and "T" is the single-digit task number. For example, the complete path names for session 02, speaker 30, speaking in response to the "road rally" task, are: wb/s02/w02_30_4.wav (wide-band channel) nb/s02/n02_30_4.wav (narrow-band channel) transcrp/s02/x02_30_4.wrd (transcription) A complete list of file names and paths is provided in the file "filename.lst". Although two corresponding waveform files contain the same utterance, and use the same sampling frequency and bytes per sample, they will appear to have different sizes on the disc. This is because the sample data have been stored in compressed form, to allow the entire corpus to fit on one disc, and the narrow-band files tend to show greater compression ratios. The speech files can be uncompressed using the "w_decode" utility that is provided in the NIST SPHERE software package, which is found in the "sphere" directory on this disc. A peculiar anomaly of the narrow-band San Diego data is the phenomenon known as "The Great Divide". There is an apparent change in the spectral characteristics of the narrow-band channel between sessions 1-5 and sessions 6-10. This involves a difference in spectral slope for the composite transfer functions in the two sets. Speaker identification algorithms generally perform poorly across the divide as a result. (For the New Jersey data, the composite transfer functions resemble those for San Diego sessions 1-5.) Speech-to-noise ratios average about 10 db worse for the New Jersey narrow-band data than for the San Diego data. It is less than 20 db for over half the New Jersey narrow-band files. Though phonetic and time-alignment markings were originally made for the corpus, they were found to have serious inconsistencies. No such markings are available at this time. The sample data files all have a NIST SPHERE header (1024 bytes) at the beginning of each file; the SPHERE headers are NOT compressed -- only the sample data are compressed -- so information about the contents of the files can be read without uncompressing the data. The headers contain lines of ASCII text that describe the contents of the files. Here is an example of the header contents, taken from file n06_19_6.wav (narrow-band, session 06, speaker 19, task 6): database_id king sample_rate 8000 channel_count 1 sample_n_bytes 2 sample_byte_format 10 sample_coding pcm,embedded-shorten-v1.09 sample_count 397669 sample_max 5035 sample_min -4415 sample_checksum 10927 recording_site SanDiego channel_id narrowband For a complete description of the header format, see the documentation files in the "sphere" directory. (Note that the "sample_byte_format" can be changed automatically in the process of uncompressing the waveform data, to accommodate systems whose native format for 16-bit words is "low-byte-first".) NIST maintains the SPHERE software package, and more recent versions will become available over time. The most current version of the SPHERE software package can be obtained for free via anonymous ftp from jaguar.ncsl.nist.gov, in the "pub" directory. This software is not subject to guarantees of any sort by NIST or the Linguistic Data Consortium, and neither NIST nor LDC may be held liable for any damages resulting from its use.