The LLHDB corpus consists of recordings of people speaking into ten different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background noise. LLHDB was created by having volunteers speak prompted and extemporaneous speech into different transducers in a sound-proof room and directly digitizing the output from the transducers on a SunSparc A/D at a 8kHz sampling rate and a 16-bit resolution.
There were three types of speech recorded for each handset. First, the speaker read the "rainbow passage" [Nolan 83], a 97 word passage sometimes used in phonetic research. Second, the speaker read ten sentences extracted from the TIMIT. Finally, the speaker was asked to describe a photograph for approximately 40 seconds (a different photograph was used for each handset). LLHDB contains speech from 53 speakers (24 males and 29 females) recruited from the laboratory.
Because the same handsets are used in both HTIMIT and LLHDB, it is possible to compare the effects of the two different recording methods.
Relative to the original CD-ROMs produced in 1998 by the Linguistic Data Consortium, the extension of the audio files was changed from ".wav" to ".sph."
Portions: 1998 MIT Lincoln Laboratory