Handset TIMIT Speech Corpus
                                  ( HTIMIT )

                        Recorded at MIT Lincoln Laboratory
                         Speech Systems Technology Group

This corpus is delivered "as is" and no claims are made for specific 
suitability. The data may be used for research purposes only and may not be 
further distributed or transmitted without the written consent of MIT Lincoln 
Laboratory. Use of this data implies agreement with the above conditions.

Introduction
------------
The HTIMIT corpus is a re-recording of a subset of the TIMIT corpus through
different telephone handsets. The aim was to create a corpus for the study of 
telephone transducer effects on speech which minimized confounding factors,
such as variable telephone channels and background noise. HTIMIT was created by
playing 10 TIMIT sentences from 192 male and 192 females through a stereo
loudspeaker into different transducers positioned directly in front of
the loudspeaker and digitizing the output from the transducers on a SunSparc
A/D at a 8kHz sampling rate and a 16 bit resolution. Ten transducers were 
used, as described in the table below. Most of the telephone handsets are not 
new (except el2) and were obtained from the Lincoln Telecom office.  Handsets 
with obvious damage were not used, but in order to obtain some diversity with 
a limited number of handsets, handsets were selected to have variable sound 
characteristics, transducer designs or, in the case of electrets, different 
grill designs. For example, cb1-cb3 have the same handset manufacture name 
(NT G-type) but the carbon-button transducer is different in each. In addition,
cb3 and cb4 were selected because they had particularly poor (although not 
pathological) sound characteristics.

Table 1: Transducers used in corpus.
----------------------------------------------------------------------------
Transducer Name |  Description
----------------|-----------------------------------------------------------
senh            |  Sennheizer head-mounted microphone
----------------|-----------------------------------------------------------
pt1             |  Sony portable (cord-less) telephone
----------------|-----------------------------------------------------------
el1             |  Northern-Telecom Unity electret (3-line grill)
----------------|-----------------------------------------------------------
el2             |  Northern-Telecom Unity Noisy-Environment electret 
                |  (2-line grill)
----------------|-----------------------------------------------------------
el3             |  Unknown manufacture electret (64-hole grill)
----------------|-----------------------------------------------------------
el4             |  Radio Shack Chronophone-255 electret telephone
----------------|-----------------------------------------------------------
cb1             |  Northern-Telecom G-type carbon-button 
                |  (center hole membrane transducer)
----------------|-----------------------------------------------------------
cb2             |  Northern-Telecom G-type carbon-button 
                |  (6 hole metal transducer)
----------------|-----------------------------------------------------------
cb3             |  Northern-Telecom G-type carbon-button 
                |  (6 hole membrane transducer)
----------------|-----------------------------------------------------------
cb4             |  ITT carbon-button (6 hole membrane/attached transducer)
----------------------------------------------------------------------------

The collection procedure is obviously not ideal. First, the speech has been
played through a loudspeaker which imposes some frequency response on the
signal (although this will be a common factor among all recordings in this
corpus). Second, the coupling of the transducer to the sound source is not 
realistic. However, this procedure allows for the collection of 
speech from a large number of speakers repeating identical speech on each
instance. Furthermore, coupled with the phonetic markings of from the original
TIMIT corpus, HTIMIT offers the ability of studying handset transducer effects
on speech recognition systems.

To address the realism of the sound transduction in HTIMIT, a second corpus 
using the same handsets but with live people speaking into the handsets is 
also available, This corpus is called the Lincoln Laboratory Handset Database
(LLHDB) and may be obtained through the LDC.

Data Organization
-----------------
The files are organized in the following hierarchy:

              <Handset1>  <Handset2> ... <Handset10>
           ________|___________
          /        |           \
       <spkr1>  <spkr2> ... <spkr384>
    ______|___________   
   /      |           \
sa1.wav sa2.wav ... sx1234.wav

The following TIMIT-style naming convention is used.

<HANDSET>/<SEX><SPEAKER_ID>/<SENTENCE_ID>.<FILE_TYPE>
 
  where,
  HANDSET :== cb1 | cb2 | cb3 | cb4 | el1 | el2 | el3 | el4 | pt1 | senh 
	      (see Table 1 for handset code description)
  SEX :== m | f
  SPEAKER_ID :== <INITIALS><DIGIT>
	where, 
	INITIALS :== speaker initials, 3 letters
	DIGIT :== number 0-9 to differentiate speakers with identical initials

  SENTENCE_ID :== <TEXT_TYPE><SENTENCE_NUMBER>
	where,
	TEXT_TYPE :== sa | si | sx
	              (see TIMIT documentation for text type description)
	SENTENCE_NUMBER :== 1 ... 2342
  FILE_TYPE :== wav (Speech waveform file with NIST Sphere header)

Example:
	cb1/mklw0/sa1.wav
	(carbon-button 1 handset, male speaker, speaker-ID "klw0", 
	sentence text "sa1", speech waveform file)

Using prepended tones and a correlation detector, an effort was made to 
align a speaker's speech files across handset recordings. It is estimated that
the alignment error is at most 50ms.

In addition to the 384 speaker subdirectories, each handset directory also
contains two test signals recorded through the handset:
- 1 white noise test signal (5 sec of zero mean, Gaussian noise)
- 1 sweep tone test signal (4 sec @ 1kHz/sec)

The test signals were created with Entropic's testsd program as follows:
testsd -p 80000 -T gauss -t short -r 16000 -l 1000 white_noise.sd
testsd -p 64000 -T sine -t short -r 16000 -l 1000 -C 1000 -f 0 sweep_tone.sd

The original Entropic file header format on these test signal files
was replaced with the standard NIST Sphere header format for CD-ROM
publication; the names of the test signal files are:
- white_ns.wav
- sweep_tn.wav

While the names of individual signal files are identical across
handset directories, the content of each file does differ as a
function of the respective handset characteristics.  Users should be
careful to preserve directory path information when combining the
contents of different handset directories.

The doc directory contains the following files:
- spkrs.lst   : A list of the speakers and their dialect regions from the
                original TIMIT corpus.
- icassp97.ps : A Postscript version of an ICASSP paper describing the
                HTIMIT and LLHDB collection procedures.