Handset TIMIT Speech Corpus ( HTIMIT ) Recorded at MIT Lincoln Laboratory Speech Systems Technology Group This corpus is delivered "as is" and no claims are made for specific suitability. The data may be used for research purposes only and may not be further distributed or transmitted without the written consent of MIT Lincoln Laboratory. Use of this data implies agreement with the above conditions. Introduction ------------ The HTIMIT corpus is a re-recording of a subset of the TIMIT corpus through different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background noise. HTIMIT was created by playing 10 TIMIT sentences from 192 male and 192 females through a stereo loudspeaker into different transducers positioned directly in front of the loudspeaker and digitizing the output from the transducers on a SunSparc A/D at a 8kHz sampling rate and a 16 bit resolution. Ten transducers were used, as described in the table below. Most of the telephone handsets are not new (except el2) and were obtained from the Lincoln Telecom office. Handsets with obvious damage were not used, but in order to obtain some diversity with a limited number of handsets, handsets were selected to have variable sound characteristics, transducer designs or, in the case of electrets, different grill designs. For example, cb1-cb3 have the same handset manufacture name (NT G-type) but the carbon-button transducer is different in each. In addition, cb3 and cb4 were selected because they had particularly poor (although not pathological) sound characteristics. Table 1: Transducers used in corpus. ---------------------------------------------------------------------------- Transducer Name | Description ----------------|----------------------------------------------------------- senh | Sennheizer head-mounted microphone ----------------|----------------------------------------------------------- pt1 | Sony portable (cord-less) telephone ----------------|----------------------------------------------------------- el1 | Northern-Telecom Unity electret (3-line grill) ----------------|----------------------------------------------------------- el2 | Northern-Telecom Unity Noisy-Environment electret | (2-line grill) ----------------|----------------------------------------------------------- el3 | Unknown manufacture electret (64-hole grill) ----------------|----------------------------------------------------------- el4 | Radio Shack Chronophone-255 electret telephone ----------------|----------------------------------------------------------- cb1 | Northern-Telecom G-type carbon-button | (center hole membrane transducer) ----------------|----------------------------------------------------------- cb2 | Northern-Telecom G-type carbon-button | (6 hole metal transducer) ----------------|----------------------------------------------------------- cb3 | Northern-Telecom G-type carbon-button | (6 hole membrane transducer) ----------------|----------------------------------------------------------- cb4 | ITT carbon-button (6 hole membrane/attached transducer) ---------------------------------------------------------------------------- The collection procedure is obviously not ideal. First, the speech has been played through a loudspeaker which imposes some frequency response on the signal (although this will be a common factor among all recordings in this corpus). Second, the coupling of the transducer to the sound source is not realistic. However, this procedure allows for the collection of speech from a large number of speakers repeating identical speech on each instance. Furthermore, coupled with the phonetic markings of from the original TIMIT corpus, HTIMIT offers the ability of studying handset transducer effects on speech recognition systems. To address the realism of the sound transduction in HTIMIT, a second corpus using the same handsets but with live people speaking into the handsets is also available, This corpus is called the Lincoln Laboratory Handset Database (LLHDB) and may be obtained through the LDC. Data Organization ----------------- The files are organized in the following hierarchy: ... ________|___________ / | \ ... ______|___________ / | \ sa1.wav sa2.wav ... sx1234.wav The following TIMIT-style naming convention is used. //. where, HANDSET :== cb1 | cb2 | cb3 | cb4 | el1 | el2 | el3 | el4 | pt1 | senh (see Table 1 for handset code description) SEX :== m | f SPEAKER_ID :== where, INITIALS :== speaker initials, 3 letters DIGIT :== number 0-9 to differentiate speakers with identical initials SENTENCE_ID :== where, TEXT_TYPE :== sa | si | sx (see TIMIT documentation for text type description) SENTENCE_NUMBER :== 1 ... 2342 FILE_TYPE :== wav (Speech waveform file with NIST Sphere header) Example: cb1/mklw0/sa1.wav (carbon-button 1 handset, male speaker, speaker-ID "klw0", sentence text "sa1", speech waveform file) Using prepended tones and a correlation detector, an effort was made to align a speaker's speech files across handset recordings. It is estimated that the alignment error is at most 50ms. In addition to the 384 speaker subdirectories, each handset directory also contains two test signals recorded through the handset: - 1 white noise test signal (5 sec of zero mean, Gaussian noise) - 1 sweep tone test signal (4 sec @ 1kHz/sec) The test signals were created with Entropic's testsd program as follows: testsd -p 80000 -T gauss -t short -r 16000 -l 1000 white_noise.sd testsd -p 64000 -T sine -t short -r 16000 -l 1000 -C 1000 -f 0 sweep_tone.sd The original Entropic file header format on these test signal files was replaced with the standard NIST Sphere header format for CD-ROM publication; the names of the test signal files are: - white_ns.wav - sweep_tn.wav While the names of individual signal files are identical across handset directories, the content of each file does differ as a function of the respective handset characteristics. Users should be careful to preserve directory path information when combining the contents of different handset directories. The doc directory contains the following files: - spkrs.lst : A list of the speakers and their dialect regions from the original TIMIT corpus. - icassp97.ps : A Postscript version of an ICASSP paper describing the HTIMIT and LLHDB collection procedures.